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Buyer beware 


An investigation by Nature shows the scale of the market for unapproved stem-cell therapies in 
China. Hype and unrealistic hope must not be allowed to undermine genuine promise. 


remedies for every kind of disease or injury. Companies also 

promise that the cells will improve appearance or provide a 
‘rejuvenating’ energy boost. The message — that stem-cell therapies 
need some work, but are an accepted part of medicine — is as clear 
as it is wrong. 

But repeat this mantra enough — as it is repeated endlessly online 
— and the promises can start to seem real. In some places, they cer- 
tainly look real. As we reveal on page 149, these ‘cures’ are offered in 
real clinics in China, where real nurses and doctors inject people with 
stem cells in various formulations from various sources — apparently 
convinced that they are helping patients. It looks and feels routine. 

China has tried to crack down on unapproved treatments and it is 
not the only place where patients can buy these therapies: stem-cell 
companies also take advantage of gaps in regulation enforcement in 
the United States (see Nature 483, 13-14; 2012). But in China the 
problem is more widespread. 

Promoters of such unproven and unapproved ‘treatments’ liken 
stem-cell therapy to other once-revolutionary therapies, such as organ 
transplantation. Doctors confess that they can’t guarantee that the 
stem cells will work, but they do guarantee that the procedures are safe. 
If they werent, say advocates, we would hear about it. So why not try? 

This circular logic makes the apparent infiltration of stem-cell 
technologies into the medical mainstream even more worrisome. The 
more willing patients and medics are to believe, the less they look for 
true clinical data, and the less doctors are forced to produce it. 

Compare the emergence of stem-cell therapies with the introduc- 
tion of psychosurgery. Like stem-cell practitioners today, doctors in the 
1930s and 1940s felt that the need for lobotomy was urgent enough 


() n the Internet, you can find advertisements for stem-cell 


to bypass the requirement for clinical evidence. Results were reviewed 
selectively, with pacifying brain damage sometimes taken asa stabilizing 
‘cure’ for schizophrenia or nervous disorders. One promoter even shared 
the 1949 Nobel Prize in Physiology or Medicine for his part in devel- 
oping the procedure. There was no long-term follow-up. In the end, 
doctors mutilated the brains of thousands of patients over the course of 
decades before critics were able to cast enough doubt on lobotomy to 

halt its use. The widespread acceptance made 


“Acceptance it difficult for people to realize that these pro- 
1s already cedures were actually doing harm: you don't 
over taking see a problem if you don't bother looking for it. 
clinical Of course, there is much legitimate research 
evidence, with into stem cells, including many controlled 
no systematic clinical trials. It would bea shame if they were 


follow-up.” tainted by association with historical failures. 
However, judging from Nature’s investigation 
in China, acceptance is already overtaking clinical evidence, with no 
attempt at systematic follow-up of treatments. If stem-cell therapies 
result in cancer or immunological disease, no one will know. 

This does not stop people from outside China flocking to the coun- 
try to take advantage of the stem-cell therapies offered there and pro- 
moted online with glowing endorsements. The clinics are certainly set 
up to make foreigners feel at home. Set aside from the teeming Chinese 
hospitals, stem-cell treatment centres have orderly nurse stations, well- 
lit rooms and good bedside care. What is lacking is controlled clinical 
trials, reliable data and government approval. If the dedicated medical 
workers at the clinics don’t see the problems, they need to look harder. 
If they really want to help their patients, they should seek to prove that 
the treatments work, rather than just assuming that they do. m 


Honest work 


The plagiarism police deserve thanks for 
defending the honour of the PhD. 


because of plagiarism detected in his 1992 PhD thesis on physi- 

cal education. Tivadar Tulassay, rector of Budapest’s prestigious 
Semmelweis University, showed admirable courage by standing up to 
the Hungarian establishment to revoke the thesis a few days earlier, 
after experts appointed by the university declared that Schmitt's thesis 
“failed to meet scientific and ethical standards”. Tulassay, a cardio- 
vascular researcher, has since assumed personal responsibility for his 
university’s decision to revoke Schmitt's title. 


ast week, Hungary's President Pal Schmitt was forced to resign 


The affair has remarkable parallels with that of Germany’s former 
defence minister, Karl-Theodor zu Guttenberg, who resigned in 
March last year after his own PhD thesis, in law, had been revoked by 
the University of Bayreuth. 

Like Schmitt, zu Guttenberg tried at first to deny plagiarism charges, 
then to underplay them, and he enjoyed powerful political support — 
until protests bya movement of honest PhD holders made his situation 
untenable. Plagiarism hunters have other prominent personalities in 
their sights, and are not necessarily going to be stopped just because a 
thesis is not in electronic form — if suspicion is high, they will digitize 
it themselves. 

In many central European countries, an academic title is a decided 
advantage for a political career; clearly, some ambitious politicians 
think nothing of obtaining such a title by cheating. We can thank the 
plagiarism hunters — whatever their individual motives — for exposing 
dishonesty among those who govern us, and for defending the honour of 
a PhD. The only safe doctorate these days is an honestly acquired one. = 
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JEFF SWENSEN 


WORLD VIEW jensen 


balanced ship trying not to capsize during a storm, banks and 

financiers are unwilling to make loans or accept collateral in 
exchange for securing debts — they fear being overwhelmed by the 
next wave of crisis. Even though the first sovereign default, in Greece, 
has passed, there is no obvious port in this storm. 

How did the system get so far out of balance? And how can scientists 
— accustomed to modelling complex events to explain and make pre- 
dictions — help to create a financial system that is more self-stabiliz- 
ing? Existing financial models failed to predict the crisis of 2008 and 
the follow-on crisis of 2011-12. They missed the huge system-wide 
risks that developed as banks promoted an undisciplined supply of 
mortgages and created an increasingly complex web of relationships 
through legal contracts that transferred risk 
throughout the financial markets. 

Financial bonds based on these mortgages (and 
other assets) were seen as risk-free and cheap, and 
banks used them as both capital and collateral. But 
when house prices plummeted, the bonds were 
useless for securing even short-term loans for the 
banks, which suddenly faced cash shortages. The 
banks held assets that were potentially worthless, 
and they were all interconnected. If one firm went 
down, everyone else was vulnerable. 

Market forces function only if all risks are 
fairly priced. The system-wide risks that these 
bonds held were not taken into account, so the 
bonds were sold too cheaply. A clear scientific 
goal, therefore, is to build better system-wide 
models of the global financial system. Both the 
industry and regulators could use such models to judge financial risk 
and make decisions. 

True, scientists are not blameless with regard to the recent collapse. 
They helped to create the models that the banks routinely used to 
measure risk. But those models lacked crucial data — on common 
holdings and trading behaviours, for instance, and on the intercon- 
nections between firms and the capacity of markets to execute trades. 

For commercial reasons, banks have historically been reluctant 
to share this kind of information, but that is changing. Legislation 
in the United States now allows regulators to collect such data from 
banks, pension funds, insurance companies and other big players in 
the financial markets. Regulators in Europe are following suit, and 
hopefully Asia will as well. As a result, we will soon be able to model 
and identify potential system-wide risks. 

To get an idea of the challenge of modelling 


Te financial system is in a credit-confidence trap. Like a badly 


system-wide financial risk, consider an indoor NATURE.COM 
shopping mall that charges people to enter and __ Discuss this article 
exit. To model the movement of shoppers, we _ onlineat: 

could build a purely statistical model ofthe door —_go.natuire.com/mzcyd6 


A CLEAR SCIENTIFIC 
GOAL IS TO BUILD 
BETTER 


MODELS 


OF THE GLOBAL 


FINANCIAL 
SYSTEM. 


Scientists and bankers — 
anew model army 


Bankers must now surrender more information on their activities. Scientists 
should use it to build better system-wide financial models, says John Liechty. 


traffic, and in most situations this would be sufficient. However, in 
extreme situations, such asa fire, the system would change dramati- 
cally. Shoppers would rush for the nearest exits and ticket-takers would 
get overwhelmed and close their doors. Then shoppers would rush 
to the next set of doors. In such cases, a statistical model would get it 
horribly wrong. 

Equilibrium or structural models of the same system would track 
and predict the motives — and, therefore, the movement — of each 
individual. In normal times, these models are too complex and hard 
to calibrate — imagine trying to quantify all of the reasons that people 
go shopping. But in times of distress, as the shoppers’ motives and 
behaviour converge (get out!), the model output improves. 

To fully understand and predict the dynamics of a market in 
crisis, we have to understand the capacity of the 
market-makers (the ticket-takers at the door of 
the shopping mall) and the demand for assets 
when prices lurch significantly away from pre- 
sent levels (the number of shoppers trying to get 
out versus the number trying to get in). The new 
data will allow models to do this for the first time. 

Clearly, regulators have a responsibility to 
build such models and to use them to monitor 
for potential crisis. To do this, they will need to 
leverage expertise among scientists by supporting 
and encouraging research in universities and labs, 
and by hosting the more applied work to main- 
tain confidentiality. Bankers should join this effort 
too, if only to avoid forcing regulators to use crude 
tools to set prices on these risks — through capital 
ratios or transaction taxes, for example. 

Bankers should work in parallel and form an industry group that 
collects system-wide data from its members, organizes resources for 
scientists to develop the necessary models, and creates a secure and 
confidential infrastructure for members to determine the price of sys- 
tem-wide risks. The industry already has a group that does something 
similar — the international Operational Riskdata eXchange Associa- 
tion, which shares operational-loss data among member firms. 

Everyone would benefit if bankers were to engage with scientists to 
build the infrastructure needed to price system-wide risk. Banks could 
get feedback about common holdings and trading strategies, which 
would allow them to adjust their behaviour and avoid following the 
herd. Regulators would have extra market information to help them 
to determine when to act to ensure stability. And the rest of us could 
have increased confidence in the financial system. = 


John Liechty is director of the Center for the Study of Global Financial 
Stability and Professor of Marketing and Statistics at Pennsylvania 
State University in University Park. 

e-mail: jcll2@psu.edu 
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A look at backyard 
biodiversity 


The choices city dwellers make 
when deciding which plants 

to cultivate in their gardens or 
yards could affect the function 
and health of the wider plant 
ecosystem. 

Sonja Knapp at the 
Helmholtz Centre for 
Environmental Research in 
Halle, Germany, and her group 
compared the characteristics 
and diversity of plants in 137 
residential yards with those of 
a nearby nature reserve. They 
found that yard plant species 
were more closely related 
to each other, shorter-lived, 
faster-growing and more likely 
to be self-pollinating. 

As yard plants spread into 
natural habitats, the ability of 
those ecosystems to respond to 
environmental change could 
be reduced, the authors say. 
Ecology http://dx.doi.org/ 
10.1890/11-0392.1 (2012) 


Licking ants fight 
fungal infection 


Healthy ants that rub up against 
infected counterparts or even 
lick pathogenic fungal spores 
off them may be immunizing 
themselves and, ultimately, 
protecting their whole colony. 

Sylvia Cremer at the Institute 
of Science and Technology 
Austria in Klosterneuburg and 
her colleagues infected ants 


GEOLOGY 


Hot tuff not so tough 


Tuffs are volcanic rocks commonly used as 
building materials despite their notorious 
weakness — and at least one popular tuff could 
pose an even greater hazard in the event ofa fire. 
Michael Heap at the University of Strasbourg 
in France and his colleagues examined three 
types of tuff commonly used in buildings in 
the Neapolitan region of Italy (pictured). Two 
exhibited no reduction in strength after thermal 
stressing, but the most commonly used one, 


(Lasius neglectus; pictured) 
with fluorescently labelled 
fungal spores (Metarhizium 
anisopliae) and released them 
among healthy members of 
their colony. The authors 
found that spores frequently 
transferred to healthy 
ants, resulting in low-level 
infection. Genetic analysis 
revealed that these minor 
infections upregulated a set of 
immune-system genes 
that bolstered the ants’ 
anti-fungal defences. 
Computer modelling 
suggests that this ‘social 
immunization actively 
stimulates the ants’ immune 
systems, allowing the colony 
as a whole to fight infection. 
PLoS Biol. 10,e1001300 (2012) 


MicroRNAs boost 
gene variation 


Small RNA molecules that 
regulate and stabilize the 
expression of certain genes 
in humans may also promote 
and preserve variations in 
gene expression between 
individuals and ethnic groups. 
Jian Lu and Andrew Clark 
at Cornell University in 
Ithaca, New York, examined 
the expression profiles of 
protein-coding genes that are 
influenced by microRNAs 
(miRNAs) and were obtained 
from multiple human 
populations. The authors 
compared these profiles with 
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known as Neapolitan yellow tuff, lost 80% of its 
compressive strength as temperatures reached 
1,000°C. This is explained by the fact that 
Neapolitan yellow tuff contains zeolite minerals 
that are sensitive to heat. 

The team suggests that the results be 
considered in establishing regional fire codes 
and recommends similar tests for building tuff 
in other regions. 

Geology 40, 311-314 (2012) 


those of genes not targeted by 
miRNAs. Expression of some 
of the miRNA-regulated genes 
varied little across populations, 
or between humans, 
chimpanzees and macaques. 
However, most differed from 
one individual to another and 
between ethnic groups. 
Genome Res. http://dx.doi.org/ 
10.1101/gr.132514.111 (2012) 


NANOTECHNOLOGY 


Lasers sort 
particles by size 


Gold nanoparticles have a 
range of biomedical uses, 
from detecting tumours to 
delivering drugs. However, 
their size is important because 


MARKA/SUPERSTOCK 


it affects their optical and 
mechanical properties, as 

well as their toxicity. Martin 
Ploschner and his colleagues at 
the University of St Andrews, 
UK, report an efficient way to 
sort gold nanoparticles by size 
using laser light. 

The team aimed green and 
red lasers at a thin layer of 
water containing a mixture 
of gold nanoparticles 100 and 
130 nanometres in diameter. 
The green light’s frequency 
matched that of the electrons 
in the smaller nanoparticles. 
This resonance enhanced 
forces acting on the particles, 
pushing them in one direction. 
The red light interacted with 
the larger particles, moving 
them in the opposite direction. 

The researchers suggest 
that the method could sort 
nanoparticles more finely than 
current methods, which rely 
on centrifugation. 

Nano Lett. http://dx.doi. 
org/10.1021/nl204378r (2012) 


PETIT 
Fewer imprinted 
genes at re-count 


Most mammalian cells have 
one maternal and one paternal 
copy of most genes, but some 
genes carry a molecular 
signature or ‘imprint that 
silences one copy. Tomas 
Babak at Stanford University 
in California and his team 
mapped the imprinted genes 
in mouse brains and found far 
fewer than recent estimates 
had suggested. 

In 2010, two studies found 
more than 1,300 imprinted 
genes in the mouse brain, ten 
times more than traditional 
counts. The increase was 
attributed to improved 
RNA-sequencing technology. 
When Babak et al. repeated 
the experiments, they found 
only 13% of the imprinted 
genes first identified by the 
2010 studies and uncovered 
statistical weaknesses that 
resulted in many false-positive 
signals. Using a different 
analytical approach, the 
authors identified roughly 
50 new candidate imprinted 
genes. 


Having a catalogue of 
imprinted genes is important 
for understanding why 
imprinting occurs and how it 
can go awry. 

PLoS Genet. 8, 1002600 (2012) 


Follow the 
lymph vessels 


Lymph vessels grow as wounds 
heal and cancers spread — a 
process that can be imaged 

in living animals, researchers 
demonstrate in mice. 

Lymph vessels often sprout 
at sites of inflammation, 
and their growth has been 
linked to tumour metastasis. 
Sagrario Ortega at the 
Spanish National Cancer 
Research Centre in Madrid 
and her colleagues genetically 
engineered a mouse to express 
a luminescent protein under 
the control of the gene Vegfr3, 
a lymphatic marker. 

The team imaged live mice, 
tracking vessel growth during 
embryo development, wound 
healing and inflammation. 
They also watched as lymph 
vessels grew at the edge of 
melanoma tumours and in 
lymph nodes infiltrated by 
the cancer. This vessel growth 
may aid the spread of cancer to 
distant organs, the authors say. 
Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1115542109 (2012) 


A graphene 
window on liquids 


By using graphene membranes 
as viewing ‘windows, 
researchers have filmed 
nanocrystals growing in 
liquids at atomic resolution. 
Studying structures in 
liquids at the atomic level 
is challenging because the 
imaging technique of choice, 
transmission electron 
microscopy, requires that 
samples be in a vacuum to 
maximize their interactions 
with the electron beam. Air- 
tight capsules can be used 
to enclose liquids, but are 
thick and made of materials 
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Small, cancer-resistant mice 


> HIGHLY READ 


on www.cell.com 


in March 


Boosting the levels of a tumour-suppressor 
protein in mice makes them smaller and 
more metabolically efficient, as well as 


resistant to cancer. 

Pier Paolo Pandolfi at Beth Israel Deaconess Medical Center 
in Boston, Massachusetts, and his colleagues genetically 
engineered mice to have additional copies of Pten, a gene that is 
mutated or deleted in many cancers. The mice are smaller than 
normal because they have fewer cells. When injected with a 
carcinogen, the animals developed tumours later than controls. 

The transgenic mice burn energy at a higher rate. Cells from 
these mice consume less glucose than normal mouse cells but 
generate more ATP — the energy molecule created during 
cellular respiration — indicating a more efficient metabolism. 

Increasing levels of the PTEN protein could offer a 
therapeutic approach to preventing both cancer and obesity. 


Cell 149, 49-62 (2012) 


that interfere with passing 
electrons, resulting in a blurred 
picture. Membranes made of 
graphene — atomically thin 
sheets of carbon atoms — are 
both impermeable to liquids 
and much more transparent to 
electrons. 

Paul Alivisatos at the 
Lawrence Berkeley National 
Laboratory in Berkeley, 
California, and his colleagues 
used these graphene 
windows to create atomic- 
resolution movies of platinum 
nanocrystals clumping 
together in a liquid. 

Science 336, 61-64 (2012) 


Vision with no 
nervous system 


Sponge larvae can detect light 
despite lacking a nervous 
system or the photosensitive 
‘opsir’ proteins found in all 
other known animal eyes. 
Instead, another pigment called 
cryptochrome may underlie 
the light-sensing ability of 
the sponge Amphimedon 
queenslandica (pictured), 
report Todd Oakley at the 
University of California, Santa 
Barbara, and his colleagues. 
Cryptochromes mainly 


absorb blue light and, in other 
animals, have been implicated 
in functions from setting 
circadian rhythms to sensing 
magnetic fields. The authors 
identified two cryptochrome 
genes in the sponge. One, 
Aq-Cry2, is expressed in the 
‘ring eyes’ of A. queenslandica 
larvae and has an absorbance 
peak similar to the wavelengths 
that trigger larval activity. 
Because eye evolution in 
other animals has always 
involved opsins, the use of 
cryptochrome represents 
a separate lineage of eye 
evolution, the team suggests. 
J. Exp. Biol. 215, 1278-1286 
(2012) 
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SEVEN DAYS news 


POLICY 


North Korea launch 


North Korea has continued 

its plans to launch a long- 
range rocket in defiance of 
international pressure not to 
break a ban on missile testing. 
With a launch planned for 
between 12 and 16 April, the 
rocket (which North Korea 
says carries a weather satellite) 
was moved into position at 
the Sohae Satellite Launching 
Station on the country’s 
northwestern coast last week. 
The plans have brought a swift 
end to February's agreement 
with the United States for food 
aid in return for a moratorium 
on nuclear testing, uranium 
enrichment and long-range 
ballistic missile development 
(see Nature 483, 128; 2012). 


Astrophysics review 


NASAs exoplanet-hunting 
space telescope, Kepler, is 
among nine astrophysics 
missions to have their lifetimes 
extended after a performance 
review by external scientists. 
Both Kepler and Swift, which 
detects y-ray bursts, will run 
through to 2016. Although it 
recommended extensions for 
all the missions in its 3 April 
report, the review committee 
criticized the Hubble Space 
Telescope for a lack of 
transparency in reporting 
operating costs, and ordered 
Fermi, a y-ray telescope, to cut 
costs by 10% each year from 
2014. Only Spitzer, an infrared 
telescope, will be phased out 
earlier than its mission leaders 
wanted, in 2015. See go.nature. 
com/r6jgjo for more. 


Carbon capture 

The United Kingdom has 
relaunched a £1-billion 
(US$1.6-billion) competition 
that will offer funding 

to companies that build 
commercial-scale facilities 

to capture and store carbon 
dioxide emissions from power 


Malaria drug resistance spreading 


Malaria parasites that are resistant to the most 
effective current treatment — drugs based 

on artemisinin — are spreading in southeast 
Asia. Resistance was first confirmed in western 
Cambodia, close to Thailand, in 2008. But it is 
also emerging 800 kilometres westward, along 
the border of Thailand and Myanmar — where 


plants, to be in operation by 
2016-20. The contest was 

first announced in 2007, but 
the bidders all pulled out. 

The relaunch, announced on 
3 April, includes contracts 
guaranteeing the sale of the 
plants’ electricity, which 
removes some financial risk. 
The competition also includes 
a further £125 million for 
research. See page 151 for 
more on carbon-capture plans 
around the world. 


US public health 


A committee convened by 
the US Institute of Medicine 
has recommended a suite of 
actions to remedy what it calls 
“dysfunction” in the funding, 
organization and capability 
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of the US public health-care 
system. Apart from urging a 
doubling of federal health-care 
spending, to be funded by 
taxing medical transactions, 
the 10 April report also 
endorsed more research into 
the effectiveness and value of 
public-health strategies. The 
2010 Patient Protection and 
Affordable Care Act authorized 
a programme of research 
focused on similar issues — but 
that legislation is currently 
being challenged in the courts. 


| BUSINESS 
Solar insolvencies 


German solar-cell 
manufacturer Q-cells — which 
was the world’s largest maker 


© 2012 Macmillan Publishers Limited. All rights reserved 


villagers are shown being screened for malaria. 
Resistance will reach rates reported in western 
Cambodia within 2-6 years, says a 5 April study 
on the effectiveness of artemisinin treatments 
in more than 3,200 patients during 2001-10 

(A. P. Phyo et al. Lancet http://doi.org/hsw; 
2012). See go.nature.com/wefjya for more. 


of solar panels just five years 
ago — has filed for insolvency. 
The company, based in 
Bitterfeld- Wolfen, lost 

€846 million (US$1.1 billion) 
last year and had been trying 
to restructure its debts, but 
gave up on 3 April. Three 
other German solar firms 
have filed for insolvency since 
December, as state subsidies 
decline and the market 

is flooded with low-cost 
photovoltaic modules and 
polysilicon raw material, much 
of it from China. 


Imaging Alzheimer’s 
The US Food and Drug 
Administration on 6 April 
approved the first diagnostic 
test for plaques in the brain, 


FRANCOIS NOSTEN 


w which are often associated 

2 with Alzheimer’s disease. The 

= test, developed by Eli Lilly, a 

= pharmaceutical firm based 

= in Indianapolis, Indiana, 

w relies on a compound called 

3 florbetapir (Amyvid), which 

= binds to amyloid plaques 

= and can be imaged in living 
patients (see Nature 469, 458; 
2011). Florbetapir assays 
will complement existing 
behavioural diagnoses of 
dementia, but researchers 
hope that the tests will 
eventually become a tool for 
early diagnosis and treatment 
of the disease. 


GM pigs on hold 


A genetically modified 

pig intended for human 
consumption seems unlikely 
to reach the dinner table any 
time soon, after local hog 
producers decided to pull 
their funding for the project. 
The Enviropig, developed at 
the University of Guelph in 
Canada, contains a transgene 
that enables it to better 
absorb phosphorus from its 
food, in turn reducing the 
phosphorus content of its 
manure and counteracting 
problems such as algal blooms 
in waterways fertilized by 
phosphorus run-off. It has not 
yet been approved for human 
consumption, and Ontario 
Pork, a Guelph-based group 
for hog producers that had 


TREND WATCH 


The Spanish government's 


SOURCE: COSCE 


austerity budget has lopped one- 


quarter off funding for research 
and development — much 
larger than the average 17% 
cut applied to other central- 
government departments. But 
public research funds may 
not be as badly hit as these 
figures suggest. More than 
half of overall funding relates 
to tax credits for technology 
transfer, for example, and last 
year almost half of those went 
unused. See go.nature.com/ 
yalqgl for more. 


put around Can$1.3 million 
(US$1.3 million) into the 
project, said last week that 

it would not provide any 
further funding because the 
genetics had been proven; 
interest from industry would 
be required to restart the 
research. 


| PEOPLE 
Anti-doping row 


Complaining of restrictions 

to his freedom of speech, 

an anti-doping researcher 

has resigned from the Swiss 
panel that oversees biological 
passports — biochemical 
profiles of athletes that help 

to detect doping (see Nature 
475, 283-285; 2011). Michael 
Ashenden, who heads the 
Science and Industry Against 
Blood Doping Research 
Consortium in Gold Coast, 
Australia, said that he quit the 
Athlete Passport Management 
Unit in Lausanne because it 
added a confidentiality clause 
asking experts not to make 
public comments for eight 
years after they left the panel. 
The unit has just taken over 
the biological-passport system 
from the International Cycling 
Union in Aigle, Switzerland. 


Heart-institute head 
The US National Institutes 

of Health (NIH) has named 
cardiologist Gary Gibbons as 
director of the National Heart, 
Lung, and Blood Institute 


(NHLBI). Gibbons (pictured) 
is currently at the Morehouse 
School of Medicine in Atlanta, 
Georgia, where he studies the 
genomics of vascular disease 
in minority populations. The 
NHLBI is the third-largest 
NIH institute, with a budget 
of roughly US$3 billion, and 
has been without a permanent 
director since late 2009. 
Gibbons, replacing acting 
director Susan Shurin, plans to 
take up the post this summer. 


Telescope rivalry 
The 24.5-metre Giant 
Magellan Telescope (GMT), 
to be built in Chile, will not 
compete with its rival Thirty 
Meter Telescope in a contest 
for backing by the US National 
Science Foundation (NSF). 
With a mere US$1.25 million 
for the winner, the contest 
was more about prestige 
than money. The NSF cannot 


SPAIN SLASHES RESEARCH SPENDING 


Spending on research and development in Spain drops 
26% in the central government’s draft budget. 


€ (billions) 


2002 2004 2006 


2008 2010 2012 
(projected) 


SEVEN DAYS TIN 
COMING UP 


16-20 APRIL 

The origin and 
evolution of life 

is the subject ofa 
biennial gathering for 
astrobiologists, this year 
held in Atlanta, Georgia. 
go.nature.com/5blicy 


16-21 APRIL 

In Panama City, 

the newly formed 
Intergovernmental 
Platform on Biodiversity 
and Ecosystem Services 
lays out its priorities 

for monitoring global 
ecology. 
go.nature.com/o3zm54 


provide significant cash for 
either ground-based telescope 
until at least 2020, although 
the winner might find it 

easier to attract international 
partners. Last week, the GMT’s 
board of directors declared 
that the project could proceed 
on its own. See go.nature.com/ 
rwc7df for more. 


Swedish bioscience 
Sweden’s government 

said on 3 April that the 
Science for Life Laboratory 
(SciLifeLab), an existing 
bioscience collaboration 
between four universities, 
will in 2013 expand into a 
national research institute for 
molecular biosciences and 
bioinformatics in Stockholm. 
It will eventually employ 
1,000 scientists. The centre 
— which currently employs 
300 scientists in Stockholm 
and Uppsala — is to receive 
220 million Swedish kronor 
(US$32.9 million) from 

the private Knut and Alice 
Wallenberg Foundation in 
Stockholm, and between 

$25 million and $50 million 
from pharmaceutical 

firm AstraZeneca. The 
government’ contribution will 
be announced this autumn. 
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REGENERATIVE MEDICINE 


It’s boom time for firms selling stem-cell treatments, 


such as those derived from umbilical blood. 


China’s stem-cell 
rules go unheeded 


Health ministry’s attempt at regulation has had little effect. 


BY DAVID CYRANOSKI 


hree months after the Chinese health 
ministry ramped up its efforts to 
enforce a ban on the clinical use of 
unapproved stem-cell treatments, a Nature 
investigation reveals that businesses around 
the country are still charging patients thou- 
sands of dollars for these unproven therapies. 
The clinics operate openly, with websites 
promoting the treatments for serious disor- 
ders such as Parkinson’s disease, diabetes and 
autism, and attract thousands of medical tour- 
ists from overseas. They advertise case studies 
of individual patients who they say have ben- 
efited from the treatments, and some have clin- 
ics in major hospital complexes, giving them 


an air of mainstream acceptance. Stem-cell 
experts contacted by Nature insist that such 
therapies are not ready for the clinic and say 
that some may even endanger patients’ health. 
But the Chinese government is struggling to 
enforce its ban. 

In May 2009, the Chinese Ministry of Health 
classified stem-cell treatments as Category 3 
medical technologies, defined as “high risk” 
and requiring the approval ofa technical audit 
board before use. So far, no approvals have 
been granted. Despite this, “one 2009 estimate 
put the number of stem-cell companies based 
in China at around 100”, says Doug Sipp, a 
stem-cell ethics and regulation researcher at 
the RIKEN Center for Developmental Biol- 
ogy in Kobe, Japan. In his view, “even after the 
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reform efforts by the Ministry of Health, the 
industry apparently continues to grow.’ 

In January, recognizing the worsening situa- 
tion, the health ministry announced a package 
of rules for the industry. Organizations using 
stem cells must register their research and 
clinical activities, the source of the stem cells 
and ethical procedures. The ministry asked 
local health authorities to halt any unapproved 
clinical use of stem cells in their regions. And 
it called for a nationwide moratorium on new 
clinical trials for stem-cell therapies, adding 
that patients in existing clinical trials should 
not be charged. 

So far, however, the ministry’s clampdown 
has proved ineffective. According to a Min- 
istry of Health spokesman, not one clinic has 
registered in the required way, and Nature has 
found that many stem-cell clinics continue 
to offer treatments. Shanghai WA Optimum 
Health Care, for example, which has plush 
headquarters in a gated estate in one of the 
wealthiest areas of central Shanghai, claims 
success in using stem cells derived from umbil- 
ical cord or adipose tissue to treat a range of 
disorders, from autism to multiple sclerosis. 
Tony Lu, a member of the company’s science 
and technology board, says that four to eight 
injections of such cells can treat Alzheimer’s 
disease, at a cost of 30,000-50,000 renminbi 
(US$4,750-7,900) per injection. According to 
the company’s senior patient-liaison officer, 
Karina Grishina, autism can be treated with 
an adipose-tissue-derived cell injection for 
200,000 renminbi, followed a few days later 
by a 50,000-renminbi injection of umbilical 
cord cells. 

In Changchun, Tong Yuan Stem Cell claims 
to have treated more than 10,000 patients 
with a variety of disorders, including Parkin- 
son’s disease. A representative says that it also 
offers a one-year, four-injection autism treat- 
ment protocol using stem cells from aborted 
fetuses. Meanwhile, Beijing Puhua Interna- 
tional Hospital’s Stem Cell Treatment Center 
offers a four-to-five-injection protocol for 
autism, costing 205,000 renminbi. 

Those clinics all claim success in treating 
patients, but none has published data from 
controlled clinical trials. Zhou JingLi, chief 
neurologist at Beijing Puhua, says that many 
of the company’s autistic patients have shown 
marked improvements in their condition a 
couple of weeks after treatment. She agrees 
that controlled clinical trials are needed to > 
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> verify the efficacy of their treatments. “But; 
she asks, “who’s going to pay?” She adds that 
first-hand experience with patients is enough 
to show that stem-cell treatment is worthwhile. 
“First, it’s safe; second, it’s effective. We know 
that,” she says. 

Leading stem-cell scientists think otherwise. 
Commenting on Tong Yuan’s treatment for 
Parkinson's disease, Oliver Cooper, director of 
the Stem Cell Facility of the Neuroregeneration 
Institute at McLean Hospital in Belmont, Mas- 
sachusetts, and a specialist in Parkinson's dis- 
ease, says, “The products offered by Tong Yuan 
may provide anecdotal, poorly controlled, 
transient improvements in the patients, but 
Parkinson’s-disease patients need long-term 
therapies” 

“There are neither scientific nor clinical data 
to support the long-term benefits of haema- 
topoietic- or neural-stem-cell therapies for 
Parkinson’s patients,” he adds. “In fact, it’s not 
clear if the infused cells will survive for more 
than a few days in the patients.” 

Meanwhile, neurobiologist Ricardo 
Dolmetsch, an autism researcher at Stanford 
University in California, says, “The consensus 
in the autism research community, as well as 
in the stem-cell community, is that there is no 
scientifically valid reason for using stem cells 


to treat autism spectrum disorders”. He wor- 
ries that, without the proper safety studies in 
place, the treatments could “lead to serious 
complications like cancer and autoimmune 
disease”. 

In addition to anecdotal case studies, some 
Chinese stem-cell companies bolster their 
reputations by claiming to have connec- 
tions with leading politicians and scientists. 
A glossy ‘information memorandum from 
Shanghai WA Optimum Health Care con- 
tains pictures of staff with various local- and 
central-government figures, including Li 
Kegqiang, the powerful executive vice-premier 
of the State Council who is tipped to succeed 
Wen Jiabao as China’s next premier. 

It also lists Li Lingsong, director of the 
Peking University Stem Cell Research Center, 
as a member of its science and technology 
board. Li denies this. “I have so far nothing to 
do with WA,’ he told Nature, adding that he has 
asked the company to remove his name. WA 
also claims a strategic alliance with Harvard 
Medical School, although neither the medical 
school nor the Harvard Stem Cell Institute is 
aware of any such connection. Likewise, the 
University of California, Irvine, where WA 
claims to have research facilities, denies any 
formal relationship. 


When pressed, all of the stem-cell clinics 
approached by Nature said that they were 
aware of the government regulations, and 
that they were necessary — but only for other 
clinics that were not operating safely. Most 
emphasized that their own businesses were 
entirely legitimate. Nature did find one com- 
pany, Shanghai Puhua, which says that it has 
already stopped offering stem-cell treatments 
to comply with government regulations. And 
Beijing-based Wu Stem Cells will probably do 
likewise for the same reason, says company 
director Cheng Bo. 

Bioethicist Zhai Xiaomei at Peking Union 
Medical College in Beijing, a member of one 
of the government’ technical audit boards, was 
surprised to hear that any stem-cell companies 
were still operating. She says that the regula- 
tions are “absolutely clear” that companies 
must not administer unapproved stem-cell 
treatments. 

A Ministry of Health representative told 
Nature that it was aware of the problem, and 
that it would be making greater efforts to clean 
up the stem-cell business. 

Asked how WA operates despite the minis- 
try’s ban, Lu describes the regulations as “a grey 
area’. Grishina agrees: “We're in China, so there 
are different stipulations.” m SEE EDITORIAL P.141 


BIOSAFETY 


Post-mortem on mutant flu 


Virus papers get green light but controversy highlights lack of global rules on biosafety. 


BY DECLAN BUTLER 


he dust is beginning to settle on the 
"TPetstong controversy over two 

studies in which the H5N1 avian influ- 
enza virus was modified to be transmissible 
between mammals. But scientists and authori- 
ties still need to address the lack of interna- 
tional oversight for studies in which pathogens 
are deliberately made more dangerous, speak- 
ers emphasized at a two-day meeting held 
last week at the Royal Society in London. The 
meeting brought together scientists, research 
funders and experts in security, bioethics and 
foreign policy, just days after the US National 
Science Advisory Board for Biosecurity 
(NSABB) revised its earlier stance and recom- 
mended publication of the two studies. 

In December 2011, the board recommended 
redaction of experimental details of the 
studies, on the basis of concerns about bio- 
terrorism and the increased likelihood of 
accidental release of the viruses. But after con- 
sidering revised versions of the manuscripts on 
29-30 March, the NSABB voted unanimously 
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Yoshihiro Kawaoka can now publish his flu study. 


in favour of full publication of the paper sub- 
mitted to Nature by Yoshihiro Kawaoka of the 
University of Wisconsin—-Madison, and 12-6 
for publication of the content (although not 
the specific wording) of the paper submitted 
to Science by Ron Fouchier of the Erasmus 
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Medical Center in Rotterdam, the Netherlands. 

Kawaoka presented his findings for the first 
time at the Royal Society meeting (see Nature 
http://doi.org/hsn; 2012), but Fouchier gave 
only a summary of his, saying that a more 
detailed description was prohibited by Dutch 
export-control laws, which require a permit 
to disseminate samples of, and information 
about, certain dangerous pathogens. 

In December, the NSABB said that the 
information in the papers could help H5N1 
surveillance efforts and so should be made 
available to experts on a need-to-know basis. A 
major factor in the board’s change of heart was 
that subsequent international discussions con- 
cluded that there was no practical way to selec- 
tively share the data, and that national export 
controls may restrict distribution anyway, says 
Michael Imperiale, a virologist at the University 
of Michigan in Ann Arbor anda member of the 
NSABB. 

This left the board with the stark choice of 
publishing either all or none of the research, 
with publication becoming “the only way for 
public-health authorities to know what to look 
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for”, says Imperiale, who this time voted in 
favour of publishing both papers. 

Some on the board were also influenced by 
clarifications to both papers — particularly 
those suggesting that Fouchier’s viruses were 
notas pathogenic as they had initially seemed — 
and by presentations from Kawaoka and Fouch- 
ier indicating that some combinations of the 
mutations generated in their laboratories had 
already been seen in the wild. But Imperiale says 
that his vote was unaffected by the revisions. “I 
don't think the risks have changed; the authors 
have changed the host range and transmission 
properties of a deadly virus,” he says. “I don't 
think the benefits have changed either” 

Another factor in the NSABB’s decision 
was the announcement on 29 March ofa new 
US policy requiring that all publicly funded 
research on certain pathogens be assessed 
from the outset by the funding agency for the 
risk that it may be misused (see Nature http:// 
doi.org/hsp; 2012). Experts have generally wel- 
comed the guidelines, which they say should, 
if properly applied, also help to avert future 
repeats of the HSN1 controversy, in which 
the NSABB learned of the papers only shortly 
before their planned publication. 

Arthur Caplan, a bioethicist at the University 
of Pennsylvania in Philadelphia, was one of 
several speakers at the Royal Society meeting 
to argue that it would be a “mistake” to think 
that the issues raised by the papers are now fully 
resolved. Besides bioterrorism, a major concern 
is that publishing them will result in worldwide 
proliferation of similar research, possibly in 
labs that may not have well developed biosafety 
cultures and training. Both Fouchier and Kawa- 
oka worked in facilities rated at ‘biosafety level 
3 enhanced; but to expect that all such research 
would be done carefully everywhere is “utter 
malarkey’, Caplan told the meeting. 

The World Health Organization has 
recommended that work to make H5N1 
viruses more transmissible in mammals, which 
flu researchers voluntarily halted in January, 
remain suspended until the relevant authori- 
ties have assessed the safety conditions for such 
research (see Nature 482, 447-448; 2012). US 
and Dutch authorities are expected to release 
their verdicts within weeks, which Kawaoka 
said “would be the time to lift the morato- 
rium”. But others warned that doing so before 
broader debate, including planned hearings by 
US lawmakers, would be premature and could 
be perceived as arrogant. 

The challenge for any oversight system will 
be to avoid discouraging important science 
while ensuring that work is limited to labs with 
appropriate safety standards, says one scientist, 
who criticizes the reluctance of many flu 

researchers to admit that 
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Despite delays, China’s GreenGen coal-fired gasification power plant in Tianjin is going ahead. 


CLIMATE CHANGE 


Slow progress to 
cleaner coal 


China moves forward with demonstration power plant as 
United Kingdom revives carbon-capture programme. 


BY JEFF TOLLEFSON AND 
RICHARD VAN NOORDEN 


ith many of the world’s nations 
dragging their feet on cleaning up 
fossil-fuel emissions, even slow 


progress stands out. This spring, China’s state- 
owned Huaneng Group plans to fire up the 
first phase of its flagship clean-coal demon- 
stration project, moving the country one step 
closer to capturing and storing the carbon it 
emits. Despite being more than a year behind 
schedule, the GreenGen coal gasification 
plant in Tianjin puts China at the forefront of 
global efforts to exploit coal resources without 
releasing carbon dioxide. 

In 2008, leaders of the G8 group of nations 
called for the development of 20 large-scale 
projects demonstrating technologies for 
carbon capture and storage (CCS) by 2010, 
but countries have been slow to embrace the 
costly plants. Delays and cancellations have 
affected all but a handful of high-profile 
initiatives in Europe, the United States and 
Australia, whereas China, despite delays of 
its own, is still pushing forward to develop 
indigenous technologies. 

“GreenGen represents both a high degree 
of technical sophistication and a real com- 
mitment on China’s part to clean-energy 


technology,’ says Julio Friedmann, head of the 
carbon-management programme at Lawrence 
Livermore National Laboratory in Livermore, 
California. “There can be no doubt that China 
has achieved something remarkable” 
Originally estimated to cost US$1.5 billion, 
GreenGen is being developed bya consortium 
of Chinese companies, including Huaneng, 
together with Peabody Energy of St Louis, 
Missouri. The first phase is a 250-megawatt 
integrated gasification combined-cycle power 
plant, which will convert coal into ‘syngas’ — 
a mixture of carbon monoxide and hydrogen 
— tobe burned in specialized turbines to pro- 
duce electricity. Waste carbon dioxide from 
these processes can be separated more easily 
than in conventional coal-fired power plants. 
Huaneng has already begun work on a 
second phase — a smaller pilot plant that 
will send a clean stream of hydrogen through 
fuel cells and turbines to produce electric- 
ity, with carbon dioxide being captured for 
industrial use. The third phase, scheduled 
for 2015-20, will be a 400-megawatt power 
plant with full-scale carbon capture and 
storage in underground rock layers. That 
represents a substantial delay beyond the 
original completion date of 2015. Huaneng 
officials say that they revised the schedule in 
response to technical issues and delaysto > 
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> CCS projects in other nations. 

GreenGen was originally seen as a 
follow-up to FutureGen, the flagship US 
initiative that was promoted and then can- 
celled by former president George W. Bush. 
In 2009, President Barack Obama revived 
the project, but a business consortium has 
yet to settle on a home for it. Many are scep- 
tical about whether FutureGen and other 
US CCS projects will come to fruition, 
given the inability of Congress to craft a 
comprehensive programme for reducing 
greenhouse-gas emissions. 

The United Kingdom is also trying to 
renew interest in CCS. On 3 April, the UK 
Department of Energy and Climate Change 
relaunched a £1-billion (US$1.6-billion) 
competition for companies to build CCS 
demonstration plants. An earlier initiative, 
launched in 2007, fell apart in 2011 when 
the last remaining consortium pulled out 
because of concerns over costs. This time, 
the UK government has encouraged compa- 
nies by promising that electricity produced 
by CCS plants can be sold at premium prices, 
and it will also provide £125 million for CCS 
research and development. 

Many countries still see potential CCS 
plants as one-off demonstration projects, 
but the United Kingdom is trying to create 
conditions for them to become commer- 
cially viable, says Stuart Haszeldine, a 
geologist working on CCS at the University 
of Edinburgh, UK. So far, seven consortia 
have publicly announced their interest in the 
contest, which aims to commission plants in 
2016-20 and store carbon dioxide in porous 
rock under the North Sea. 

Other well-advanced initiatives include 
Canada’s Boundary Dam Integrated Carbon 
Capture and Storage Demonstration Project 
in Saskatchewan, a 100-megawatt project to 
retrofit carbon-capture technology to an 
existing power plant; and the Texas Clean 
Energy Project, a 400-megawatt integrated 
gasification combined-cycle plant that 
could begin operating in 2014. Both plan to 
sell carbon dioxide to oil companies, which 
use the gas to flush oil out of reservoirs. In 
the Netherlands, the Rotterdam Capture 
and Storage Demonstration project aims to 
begin storing carbon dioxide in depleted gas 
fields under the North Sea in 2015. 

Howard Herzog, a carbon-storage 
expert at the Massachusetts Institute 
of Technology in Cambridge, says that 
the challenge facing the entire carbon- 
capture industry is how to turn a profit 
in the absence of a serious carbon policy 
that rewards emissions-reduction projects 
financially. “The question is whether there 

is a CCS market any- 
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Roche chases stake in 
medical sequencing 


Biotech firm Illumina continues to resist takeover — but 
analysts suggest that a merger is inevitable. 


BY ERIKA CHECK HAYDEN 


ene sequencing has mainly been 
(5"« province of technology com- 

panies catering to researchers, but the 
pharmaceutical giant Roche, based in Basel, 
Switzerland, has other plans. It is increasing the 
pressure on Illumina, a DNA-sequencing com- 
pany headquartered in San Diego, California, 
hoping to absorb the firm and use its expertise 
to capture the growing market in personalized 
medicine. 

Roche proposed a merger in January, offering 
US$44.50 per share, which Illumina’s board 
rejected. On 29 March, Roche raised its bid to 
$51 per share, for a total acquisition price of 
$6.7 billion. Illumina’ board rejected this offer, 
too, and adopted a ‘poison-pill provision that 
would give present investors the right to buy 
new shares at a reduced price, thereby dilut- 
ing the value of Roche's shares if the deal goes 
through. But many observers say that the 
merger is all but certain. “If Roche really wants 
this, they are ultimately going to be able to 
make this go through,” says John Haggerty, a 
mergers-and-acquisitions lawyer with Goodwin 
Procter in Boston, Massachusetts. 

Roche’ interest in I]lumina stems from the 
latter's dominant position in genetic research; 
it claims that 90% of the world’s sequencing 
is done on its machines. Roche believes that 
whole-genome sequencing is poised to become 
a lucrative diagnostic tool, and that lumina 
will allow it to break into that market. 

Illumina could benefit from Roche's vast sales 


DOWN AND UP 


IIlumina’s share price fell last year owing to gloom 
over biomedical-research funding. Roche’s January 
bid boosted it, but only part way. 
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force, which would make it more competitive 
with its major rival, Life Technologies of 
Carlsbad, California. And Roche’s experience in 
diagnostics and drug development could speed 
the translation of sequencing into the clinic, says 
David Ferreiro, a Boston-based analyst at the 
investment bank Oppenheimer. Manufacturers 
are eager to move from next-generation 
sequencing technology into diagnostic tech- 
nology, and “Roche could do that a lot faster” 
than [lumina could, he says. 

Yet Illumina has fought back, arguing to its 
shareholders that Roche's offer undervalues the 
company (see ‘Down and up’). Iumina, which 
started out in the 1990s as a microarray vendor, 
came to dominate the sequencing market by 
steadily cutting the cost and improving the 
speed and ease of use of its products. 

However, Illumina now has numerous com- 
petitors with new platforms nipping at its heels. 
The firm is cagey about what technologies could 
replace its current sequencing machines, even as 
the newcomers, such as Oxford Nanopore Tech- 
nologies in Oxford, UK, are promising big leaps 
in the speed and price of sequencing — with 
the aim of attaining the $1,000 human genome. 
Industry observers question whether Illumina’s 
nimbleness will survive a Roche merger. They 
point to Roche’ 2007 acquisition of sequencing 
company 454 Life Sciences in Branford, 
Connecticut, which has since failed to live up 
to hopes that it would capture a significant part 
of the market. 

To up the pressure, Roche has nominated six 
people for Illumina’s board of directors who, if 
elected by shareholders at the company’s annual 
meeting on 18 April, might coax the board 
to work out a deal. But with advisory firms 
cautioning major shareholders to hold out, 
Roche will probably continue to raise its price 
to entice institutional investors and arbitrage 
traders, who bought Illumina shares expecting 
Roche to sweeten its offer. “It’s just a matter of 
how much the Illumina board can force them 
into raising their price before the stockholders 
start accepting the offer,’ says Haggerty. 

For Illumina, the end could be bittersweet. 
The company may be ina prime position to 
dominate the potentially lucrative clinical- 
sequencing market. But it may also see the 
end of its glory days as a key innovator in 
sequencing. m 
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Sea change: previous ideas that lowlands on Mars (blue) once hosted oceans are being overturned. 


ASTRONOMY 


Dreams of water 
on Mars evaporate 


Climate models reveal the red planet was mostly cold and dry. 


BY ERIC HAND 


he debate began when nineteenth- 
[ena Italian astronomer Giovanni 
Schiaparelli thought he saw water- 

filled canali, or channels, on the red planet: 
just how wet was Mars? “This is a pendulum 
that has been swinging back and forth,” says 
Jeff Andrews-Hanna, a planetary scientist at 
the Colorado School of Mines in Golden. The 
canali were an illusion, and no one doubts that 
Mars today is dry except for possible meagre 
seeps of groundwater. But in recent years 
researchers have come to accept that ancient 
Mars had lakes or even oceans — favourable 
conditions for life. “This is what started off the 
fever for Mars science,’ Andrews-Hanna says. 
But the pendulum is swinging again. Last 
month, Jim Head, a planetary scientist at Brown 
University in Providence, Rhode Island, threw 
a wet blanket on the idea that Mars was ever 
very wet at all, in a keynote talk at the Lunar and 
Planetary Science Conference in The Wood- 
lands, Texas. Head and others are assembling a 
picture of a Mars that was cold and dry from the 
beginning, punctuated at most by short bursts 
of wetness. “The notion of a palm-tree-cov- 
ered Mars has waned,’ says Stephen Clifford, a 
planetary scientist at the Lunar and Planetary 
Institute in Houston, Texas, who is organizing a 
conference in May on the early climate of Mars. 


The first spacecraft to visit Mars, in the 
1960s and 1970s, showed a bone-dry planet 
pocked with craters, much like the Moon. 
But high-resolution cameras on orbiting mis- 
sions such as the Mars Global Surveyor, which 
launched in 1997, showed valley networks — 
braided and branched channels about 3.7 bil- 
lion years old that seemed to have been carved 
by water. Then in 2005, a spectrometer on the 
Mars Express satellite found widespread evi- 
dence for clays' — minerals that testify to hun- 
dreds or even thousands of years of exposure 
to water. Suddenly, geologists did not seem so 
bold in tracing out the palaeo-shorelines of 
an ocean that could have covered the planet's 
entire low-lying northern hemisphere. 

But Head and others have countered that 
view with three main lines of evidence. The 
first comes from models of the ancient Martian 
climate that fail to predict temperatures high 
enough for rain, or for liquid water to persist 
on the surface at all. The young Sun was fainter 
than it is today, and even if the young Mars 
had a thicker atmosphere, its greenhouse effect 
would probably not have warmed the planet 
above freezing, says Francois Forget at the 
University of Paris. He 
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climate model for Mars so far. It predicts that 
any water on Mars would have been bound as 
ice at higher elevations. 

By next month's conference, Forget hopes 
to incorporate the effect of greenhouse gases, 
such as sulphur dioxide, that could have been 
present occasionally in huge doses. He hopes 
to test the idea, proposed by Head, that sul- 
phurous bouts of volcanic activity could have 
warmed the atmosphere for brief periods, just 
enough to melt the icy highlands and unleash 
torrents that could have carved the valley net- 
works. Other researchers invoke local melting 
from the heat from large asteroid impacts. 

Closer inspection of the valley networks 
supports the sporadic presence of water rather 
than an enduring wet climate, Head says. He 
points to studies, including his own, showing 
that some of the networks are isolated geo- 
graphically and in time, having formed hun- 
dreds of millions of years apart’. 

Even the clay minerals may not support a 
wet planet. A team using a spectrometer on 
the Mars Reconnaissance Orbiter found that 
roughly 80% of the clays occur together with 
other minerals that form at relatively high tem- 
peratures. This suggests that the clays formed 
not in cool surface water but underground, 
in water warmed by leftover heat from Mars’s 
formation, says Bethany Ehlmann, a planetary 
scientist at the California Institute of Technol- 
ogy in Pasadena, who led the study’. 

Curiosity, the NASA rover that is expected 
to land on 5 August, may get a closer look. A 
thin ring of clay encircles a 5-kilometre-high 
mound in the middle of the Gale crater, where 
the rover will touch down. Although Ehlmann 
says that these clays are probably among the 
20% that formed with surface water, their tex- 
tures will reveal something about the extent of 
that water. Were the clays deposited in a persis- 
tent, deep lake? Or could the waters have been 
shallow and short-lived? Some geologists have 
even suggested that the minerals could have 
formed in the presence of ice. 

Andrews-Hanna says the shift in thinking 
doesn't rule out life on ancient Mars, but instead 
drives it deeper underground. “If Mars’s climate 
was never stable, that would have been a chal- 
lenge for life,” he says. “But as you go deeper 
in the subsurface, things become more stable” 

Head doesn't think his revisionism causes 
too many problems for ancient life even at the 
surface. He says ancient Mars could have been 
very much like the dry valleys and ephemeral 
lakes of Antarctica, where dried-up mats of 
algae bloom in the warmer season, nourished 
bya trickle of meltwater from the icy highlands 
above. “The public perception is that warm 
and wet equals life,” he says. “If you look at the 
range of life on Earth, there’s no reason to sus- 
pect that life is limited to that: = 


1. Poulet, F. et al. Nature 438, 623-627 (2005). 

2. Fassett, C. |. & Head, J. W. Ill Icarus 195, 61-89 
(2008). 

3. Ehlmann, B. L et al. Nature 479, 53-60 (2011). 
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The Square Kilometre Array would be the world’s most powerful radio telescope (artist’s impression). 


Giant telescope may 
get two homes 


Split-site solution could allow both Australia and South 
Africa to host parts of the Square Kilometre Array. 


BY GEOFF BRUMFIEL 


ith the battle to host the world’s 
most powerful radio telescope 
growing increasingly acrimonious, 


the project’s leaders are considering whether to 
divide the spoils. 

The US$2.1-billion Square Kilometre Array 
(SKA) would open a window on the early 
Universe. As yet, international partners have 
not committed to covering the hefty price tag. 
But if the project goes ahead, it would bring a 


flood of funding, prestige and scientific oppor- 
tunities to one of the two competing teams: 
South Africa or joint bidders Australia and 
New Zealand. 

Last month, after considering the scientific 
merits of the two sites, the SKA Site Advi- 
sory Committee concluded that South Africa 
offered marginally better opportunities (see 
Nature http://doi.org/hst; 2012). Since then, 
the already-intense lobbying from both sides 
has increased, with politicians from each 
country insisting that they would prevail. 


Now, the SKA management board has asked 4 
a new scientific panel to determine whether 
the telescope, made up of 3,000 15-metre- 
wide dish antennas and many more simpler 
antennas, could be divided between the two 
proposed sites. 

Politicians in Australia and South Africa say & 
that they oppose any split in the project, but 2 
John Womersley, head of the SKA board and 
chief executive of the UK Science and Technol- 
ogy Facilities Council, says that a compromise 
may be one way to resolve the battle. “I have 
heard astronomers that I respect say that such 
a solution is possible; Womersley told Nature. 

Astronomers plan to use the SKA to meas- 
ure radio signals at a frequency of around 
1 gigahertz, probing early galaxy formation 
and investigating how gravity behaves near 
black holes. Capable of detecting a television ® 
signal from up to 15 parsecs (50 light years) 
away, the telescope might even aid in the 
search for extraterrestrial intelligence. Super- 
computers will integrate the signals from the 
dishes and antennas, creating a virtual dish 
with a combined collecting area of 1 square 
kilometre (see Nature 480, 308-309; 2011). 

“Normally when you have a giant dish, 
you cannot split it, but the SKA has many 
different components,” says Heino Fal- 
cke, a radio astronomer at Radboud Uni- 
versity in Nijmegen, the Netherlands. The 
easiest solution would be to divide the 
project by placing the higher-frequency 
dishes on one continent and the lower- 
frequency antennas on the other. 

Doing so would almost certainly raise the 
SKA% price tag, because computing centres and 
power would be needed in both remote loca- 
tions, says Albert Zijlstra, director of the Jodrell 
Bank Centre for Astrophysics in Manchester, 
UK. But splitting the antennas by frequency 
would avoid the need for a high-throughput 
data link connecting the two sites, something 
that Zijlstra and Falcke both expect would be 
even more costly. 

Zijlstra says that he can see little scientific 
advantage to splitting the project, aside from 
a few extra hours of observation time gained 
from having the telescope’ two parts separated 
by such a vast distance. Falcke agrees. “It’s a 
question of politics,” he says. 

The panel is expected to deliver its results by 
mid-May, when the SKA board will meet again 
to discuss the site. m 


RAM DEVELOPMENT OFFI 


PI 


WINBURNE ASTRONOMY PRODUCTIONS/SKA 


NEWSCOM 


154 | NATURE | VOL 484 | 12 APRIL 2012 


© 2012 Macmillan Publishers Limited. All rights reserved 


=> | MORE NEWS | Q&A 

Pa Social status | @ Magnetic storms spotted on Venus Private- > 
26 M 0 RE changes go.nature.com/l4wqgw spaceflight 8 
eg gene @ Astronomers find solar system pioneer Elon e 
a 0 N LI NE expression more crowded than ours go.tafure.com/ Musk sets his BS 
7 inmonkeys | s4ejgr sightsonMars © 
a go.nature.com/ | @ United Nations to appoint a chief go.nature.com/ z 
mkjjbo scientific adviser go.nature.com/tnk229 w99pls F 

to} 


FEATURE | NEWS | 


OMING OF AGE 


Researchers in 
Britain have tracked 
thousands of children 
since their birth in the 
1990s. Now the study 
is 21, and turning to 
the next generation. 


BY HELEN PEARSON 


na secure storage barn on the outskirts 

of Bristol, nearly 9,000 placentas float in 

plastic buckets of formaldehyde. 

Twenty-five kilometres away, in the base- 

ment of a university building, baby teeth 

from more than 4,000 children fill card- 
board boxes ina walk-in freezer. Next door are 
some 15,000 nail clippings and 20,000 locks of 
hair. A few steps farther, a parade of freezers 
house row upon row of bar-coded blood cells, 
plasma, urine, saliva and chunks of umbilical 
cord that together make up a tissue library with 
more than one million entries. 

This library is the harvest from an unusual 
study of humanity. In 1990, researchers started 
to collect tissues and detailed information 
from more than 14,500 pregnant women in 
this western British city and its surrounding 
region of Avon. The women filled in more 
than 100 pages of questionnaires about their 
health, relationships, work and home. After 
birth, researchers tracked the children’s devel- 
opment through surveys, clinical examina- 
tions and biological samples. They know 
what the kids ate, when they first talked, how 
often they fell sick and when a parent read to 
them — or deserted them. They know when 
the children started to hit puberty, drink alco- 
hol, have sex and leave home. In that wealth of 
data — collected at a cost of some £42 million 
(US$67 million) so far — they are tracing how 
genetic and environmental factors in the chil- 
dren’s early years affect their later ones. 


Amy Murdoch-Davis is one of 14,000 people who have been studied since birth in and around Bristol, 
UK; her baby, Esmé, will soon join the study, too. 


Now, the Bristol cohort study is coming of 
age, literally and scientifically. This spring, 
the first members of the group turn 21. And 
on 18 April, leaders of the study and their col- 
laborators from around the world will meet 
in Bristol to discuss what they have learned 
from the Avon Longitudinal Study of Parents 
and Children (ALSPAC), also known as the 
Children of the 90s. The rich collection of 
tissues and behavioural data — a compre- 
hensive phenotype for each of thousands of 
participants — makes the study unique and 
invaluable, say its leaders. “It’s the deepest 
phenotyping and biobank resource of any 
large birth cohort — unequivocally,” says 
George Davey Smith, the study’s scientific 
director. The results have generated more 
than 700 scientific papers and range from 
policy-changing health advice for pregnant 
women and young children to the discovery 
of genetic factors involved in fetal growth, 
obesity, allergies and bone density’ *. The 
study has also inspired and guided other 
birth cohorts, including the world’s largest, 
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which is tracking more than 100,000 children 
in Norway. 

Yet many of the data and samples remain 
untouched, and the cohort leaders acknowl- 
edge that the most important findings are likely 
to emerge years from now, some of them cour- 
tesy of new techniques that the founders of the 
study could barely have imagined. Davey Smith 
and his colleagues are just starting to analyse 
the children’s genomes — 2,000 of which have 
been fully sequenced — and epigenomes, 
the catalogue of chemical footprints left on a 
child's DNA by experiences in the womb or in 
the world. The researchers say that such studies 
will help them to move from a slew of loose epi- 
demiological associations — between a moth- 
er’s fish-eating and her child’s intelligence, for 
example — to the genetic and epigenetic play- 
ers responsible for those links. 

As the children themselves become parents, 
the team is expanding the scope of the study to 
anew generation. Still, at the heart of all this 
remains an unanswered question: how much 
value is there in collecting a huge amount of 
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information about human life when the sci- 
entific pay-off is unknown? Marcus Pembrey, 
who was director of genetics at ALSPAC until 
2005, recalls that when he initially told col- 
leagues that he wanted to measure “every- 
thing” about the children, “they laughed and 
said you can’t study everything. And I said you 
have to study everything. You don’t know what 
will be important in the future.” 


CONCEPTION AND BIRTH 

Today, the effort to gather everything is 
headquartered in a concrete slab of a build- 
ing in central Bristol. On the ground floor, a 
brightly painted annex houses a clinic where 
the children of the 90s come for their regular 
medical examinations. A few middle-aged 
men — the dads of the cohort members — are 
here, going from room to room for blood- 
taking, cognitive testing and bone scans. The 
study scientists are bringing in as many dads as 
they can to collect clinical data such as DNA, 
height, weight and blood pressure, in an effort 
to ramp up studies on the fathers’ health, as 
they already have for the mothers. 

Also in the clinic is 19-year-old cohort 
member Amy Murdoch-Davis, with her five- 
month-old baby Esmé. Last month, ALSPAC 
received provisional ethics approval to start 
recruiting all the children of its cohort mem- 
bers, an effort it hopes to launch in earnest later 
this year. Esmé will be one of the first signed 
up. “I’ve been a guinea pig all my life,” Mur- 
doch-Davis says, “and she can be a guinea pig 
too.’ This time around, the researchers want 
to collect even more samples, including breast 
milk and the babies’ first stools. 

The study itself was a struggling infant when 
Murdoch-Davis was born. Its leader, Jean 
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The Bristol study has banked more than 1 million samples including blood, urine, saliva and placentas. 


Golding, a mathematician turned epidemiolo- 
gist, had studied stillbirths and neonatal deaths 
and was eager to learn how events in pregnancy 
and infancy affect a child’s health and develop- 
ment. She was convinced that studying a large 
cohort, starting in pregnancy and gathering as 
much information as possible, was the best way 
to find answers. Funders, however, saw it differ- 
ently. “Everybody thought ‘if you have a cohort 
study you're tied down 

to carrying on funding 

it — and it’s a bottomless 


pit hesays “ITS THE DEEPEST 
PHENOTYPING AND 
BIOBANK RESOURCE 
OF ANY LARGE BIRTH 


Golding eventually 
scraped together some 
money by writing grant 
applications that focused 
on specific diseases. She 
drummed up interest 
from mothers by talking 
on television and radio, COHORT . 
and sending an army of : 
midwives to antenatal 
classes and doctor’s surgeries. The 14,541 
pregnancies eventually included in the study 
encompassed more than 70% of those eligible 
in the region between April 1991 and Decem- 
ber 1992. The buckets of fresh placentas started 
stacking up. And so did other data, such as 
blood squeezed from the babies’ heels during 
the early clinic visits and reams of question- 
naires filled out by the mothers (see ‘Building 
a bank of life’). Little was left unasked. Does 
baby drink breast milk or formula? Has he or 
she had antibiotics or skin ointment? Do you 
have a telephone; a tumble dryer; cockroaches? 

Within a year or two of starting, however, 
the cash was draining away. Golding employed 
her 40-50 staff one month at a time, never 
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sure whether she would have enough to pay 
them again. And there was no time or money 
to analyse the data that were flooding in. “We 
were well in the red,” Golding says. “I was just 
exhausted.” 


CHILDHOOD AND ADOLESCENCE 

A few years later, just as parents were collecting 
baby teeth from beneath thousands of pillows 
and mailing them to the research team, the 
first data analyses started coming through. 
Researchers now point to a handful of results 
for which the cohort is famous, and that had 
an impact on public-health policy. One showed 
that eating oily fish during pregnancy was asso- 
ciated with better eye and cognitive develop- 
ment in children*®. Another helped to cement 
advice that babies should be put to sleep on 
their backs to reduce the risk of cot death, by 
showing that this sleeping position did not 
cause any developmental delays’. A third 
showed the first association between peanut 
allergy — an emerging epidemic in Western 
countries — and peanut oil in baby lotions’. 
Manufacturers soon started identifying the 
ingredient on labels. 

As the children entered their second dec- 
ade, the study found itself on firm financial 
ground for the first time. The Wellcome Trust 
and the Medical Research Council awarded it 
core funding that has totalled some £21 million 
for 2001-14, and investigator-led grants have 
brought in much more. 

With the study maturing comfortably, 
Golding retired in 2005. (“They stopped paying 
me,’ as she puts it, still firmly ensconced in her 
office at the University of 
Bristol.) Lynn Molloy, a 
social scientist, took over 
the managerial reins, and 
Davey Smith, an epidemi- 
ologist passionate about 
genetics, grasped the sci- 
entific ones. 

The study entered a 
productive scientific 
adolescence even as the 
cohort members suf- 
fered their own teen 
agonies. Questionnaire 
data revealed that 19% of 16-17-year-olds 
cut or otherwise hurt themselves. By age 18, 
some 5-10% had experienced some form of 
psychosis — “more common than people had 
previously suspected”, says Stanley Zammit, a 
psychiatrist working with cohort data at Brit- 
ain'’s Cardiff University — even though few are 
likely to go on to develop schizophrenia or a 
related condition. 

At the same time, genetics was finally 
entering the picture. In the early years of the 
project, examining the children’s genotypes 
was just a fond hope. Scientists had announced 
a draft human genome in 2001, but that was a 
multibillion-dollar megaproject. Quite what to 
do with several thousand raw human genomes, 


no one knew. “The interesting thing was that 
although all these high powered alpha-male 
geneticists were talking the big talk about 
genetics — when you actually gave them DNA 
for a thousand individuals it was too much. 
They didn't have the technology,’ Golding says. 

In 2007, however, data from ALSPAC and 
several other large human biobanks were used 
to scour the human genome for single-letter 
variants associated with obesity. The work 
turned up a gene called FTO, and found that 
adults with two ‘risk’ copies of this gene are 
about 3 kilograms heavier, on average, than 
those with no risk copies’. The discovery 
became a poster child for the identification 
of risk alleles through genome-wide associa- 
tion studies, a technique that was sweeping 
through the genetics community at the time. 
The Bristol cohort — with its extensive bank of 
DNA and medical data — was perfectly placed 
to catch the wave. Its data have been used to 
identify genetic sequences associated with fetal 
growth’, bone density and childhood growth’, 
tooth development’, facial features’ and more. 
Researchers are now hunting for genetic links 
to intelligence, educational attainment and 
gender orientation. 

Davey Smith sees other opportunities to 
explore associations in the cohort’s DNA. He 
finds it frustrating that epidemiological studies 
often reveal a correlation between two factors 
— poverty and obesity, say — without proving 
that one causes another. The trouble is, many 
social and biological factors tend to correlate 
anyway: people who smoke also tend to drink 
more alcohol, eat unhealthy food, be poorer, 
weigh more and have high cholesterol and 
other signature biomarkers of ill health. 

One way to filter true causes from the corre- 
lations is to compare one cohort with another, 
something Davey Smith did recently to find 
‘causal associations’ with breastfeeding. In the 
Bristol cohort, breastfeeding is correlated with 
less obesity, lower blood pressure, higher intel- 
ligence and more good things besides. But in 
the United Kingdom, breastfeeding mothers 
are also more likely to be middle- or upper- 
class. So is it breastfeeding that helps children, 
or some other aspect of a comfortable life? 
When Davey Smith and his colleagues looked 
at a birth cohort based in Pelotas, Brazil, where 
breastfeeding is not linked to socioeconomic 
status, the links with obesity and blood pres- 
sure fell apart, but the link with IQ held up”. 

Researchers are also finding true causes 
through Mendelian randomization, a genetic 
technique that is exciting epidemiologists. 
Stephanie von Hinke Kessler Scholder, an 
economist at the University of York, UK, is 
using this method to unpick the associations 
between children’s obesity and poorer perfor- 
mance in school. Obesity is also correlated 
with lower socioeconomic status, so Scholder 
sought to test whether obesity directly hin- 
ders performance (because of bullying, for 
example), whether kids who are obese do less 
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well because they come from disadvantaged 
families, or whether some other factor could 
explain the correlation. 

Scholder took the banked DNA of the 
ALSPAC children and divided them up into 
groups on the basis of the make-up of two 
weight-related genes, FTO and MC4R. (Chil- 
dren with all ‘heavy’ copies of these genes weigh 
a few kilograms more on average than those 
with all ‘light’ copies.) Gregor Mendel’s laws 
of inheritance ensure that the heavy and light 
forms of the genes are shuffled and randomly 
delivered to children across a population, irre- 
spective of their social class or any other con- 
founding factor. And when Scholder removed 
confounders from the equation in this way, she 
found that children with the heavy versions of 
FTO and MCA4R did just as well on school tests at 
age 14as those who didnt have them. Dispelling 
the false idea that fat children do worse at school 
is “a positive thing” she says’. 

The most technologically advanced answer 
to the causation problem sits upstairs in the 
Bristol headquarters of the study. There, a 
£300,000 machine is poised to start rapid anal- 
yses of methylation — an epigenetic mark that 


© 2012 Macmillan Publishers Limited. All rights reserved 


controls gene activity. By looking at 450,000 
sites in the cohort members’ genomes, the 
researchers will build up a bank of methylation 
data on blood samples taken from the children 
at birth, at ages 7 and around 17, and on the 
mothers during pregnancy and 17 years later. 
“Tt’s a unique resource,” says Davey Smith. 

Ina study published last month”, Caroline 
Relton at Newcastle University, UK, who is 
leading the epigenetic work for ALSPAC, 
looked at methylation on an array of genes in 
the umbilical-cord blood of cohort children. 
She found that methylation signatures on 
nine genes at birth showed some association 
with body height and tissue composition at 
age nine. The finding hints that events dur- 
ing pregnancy might shape gene expression 
early in life — eventually resulting in altered 
growth and weight gain. It also “provides a 
flavour of the sort of thing we can address 
more powerfully when we get this huge data 
set under way’, Relton says. Next, the team 
plans to search for methylation patterns 
linked to factors such as neural development, 
behavioural problems, cardiovascular health 
and fatty liver disease. The researchers will 
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Jean Golding, pictured here with cohort children in 2001, started the study in the early 1990s. 


also look for potential causes of these methyl- 
ation patterns — associations with a mother’s 
smoking, alcohol intake, weight gain during 
pregnancy, folate levels and exposure to air 
pollution. 


NEXT GENERATION 
The methylation robot is just one piece of the 
high-tech equipment involved in the study. 
In one lab in Bristol, blood cells taken from 
the dads that day are being spun down and 
divided up into aliquots, ready for storage. In 
another, researchers are 
transforming cell sam- 
ples into immortal cell 
lines. They have already 
banked around 7,000 
such lines, which pro- 
vide an endless supply of 
DNA and might be used 
for future studies of cell 
behaviour. Sue Ring, the 
head of the laboratories, 
says that her team take 
turns to carry an emer- 
gency mobile phone that 
will warn them ofa freezer failure. After all that 
the participants have given to the study, she 
says, she feels a “duty of care for the samples” . 
Soon, the labs will start to process a whole 
new set of tissues — from the children of the 
cohort members. The eggs that will develop 
into these children were formed when their 
mothers themselves were babies, growing 
in their own mother’s uterus. This means 
that events in the grandmothers’ lives, such 
as an infection, stress or exposure to toxic 
chemicals — all recorded in the cohort data- 
base — could produce signals in the streams of 
data that the researchers plan to collect about 
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the grandchildren. “I was going to retire and 
wind down until we stumbled on these trans- 
generational events,’ says Pembrey, who wants 
to explore the third-generation consequences 
of traumatic events in the lives of grandparents. 

With so much yet to do, has the study justified 
the years of effort put in so far? Cohort stud- 
ies sometimes draw criticism because “they are 
very big and very long-term and they seem to 
be a lifelong source of income for investigators’, 
says Teri Manolio, who has worked on cohorts 
in the past and is now director of the Office of 
Population Genomics 
at the National Human 
Genome Research Insti- 
tute at Bethesda, Mary- 
land. 

But ALSPAC receives 
kudos from many 
researchers, including 
Nigel Paneth, an epide- 
miologist at Michigan 

tate University in East 
Lansing, who runs a 
data-collection site for 
the US National Chil- 
dren's Study, which started recruiting in 2009. 
Unlike ALSPAC, the National Children’s Study 
has ballooned into a mammoth programme 
with costs estimated at as much as $6 billion, 
and has struggled for more than a decade to 
convince sceptical scientists and funders that it 
will pay off scientifically. Paneth says that big 
cohort studies can make major contributions 
to health if they are well planned and executed. 
“Can I guarantee results? Of course I can't. But 
I can guarantee you no results if we dont do it” 

In Bristol, the team is only starting to learn 
the value of some of the data collected decades 
ago. The placentas, which had to be moved 
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twice, were regarded mostly as a “bloody 
nuisance” until three years ago, when David 
Barker at the University of Southampton, UK, 
and Oregon Health and Science University in 
Portland, asked to use them. Barker is famous 
for drawing connections between early fetal 
development and adult health; in the past 
few years he has reported that the size and 
shape of a placenta — the life support of the 
fetus — is associated with risk of adult coro- 
nary heart disease, high blood pressure and 
even lifespan“. He sent a photographer to Bris- 
tol to snap the placentas, and is now using the 
organs’ dimensions to test some of his hypoth- 
eses. The correlations are “definitely there and 
amazingly strong’, Barker says. 

Other correlations will become clear only 
when the study and its participants have 
grown up alittle more. They do, after all, have 
most of their lives ahead of them. Murdoch- 
Davis is planning the next step in hers: going 
to university to study English and psychology. 
Meanwhile, Molloy is mapping out what data to 
collect when the cohort members next visit the 
clinic, aged 24-25, when they will be close to 
the peak of their health. And Davey Smith has 
faith in future generations of scientists to find 
new ways to study the cohort. “My hope is that 
when I drop dead it’s got an energetic scientific 
director who’s implementing all these future 
technologies that I couldn't imagine,” he says. 

Perhaps, Davey Smith says, someone will be 
analysing data from the digital video record- 
ers that he’s thinking about handing to each 
expectant family, to document every significant 
event in their child’s life. After all, storing tera- 
bytes of data doesn't cost anything — and you 
never know what they might one day reveal. 
“Digital recording allows you to store up huge 
banks of data for future use,” he says. “It’s a bit 
like Jean collecting the placentas. No one knew 
what they were going to do with them.” m 


Helen Pearson is Nature’s chief features 
editor. 
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Bold strategies 
for Indian science 


For a nation of its talent and education, India deserves higher scientific standing. 
It needs clear and honest leadership, not more money, says Gautam R. Desiraju. 


hen an Indian prime minister 
publicly admits that India 
has fallen behind China, it is 


news. Manmohan Singh’s statement last 
January at the Indian Science Congress in 
Bhubaneswar that this is so with respect to 
scientific research, and that “India’s relative 
position in the world of science has been 
declining’, has rung alarm bells. Singh was 
not springing anything new on Indian sci- 
entists; many of us will admit that things are 
not well’. Recognizing the problem is the 
first step towards reversing this slide. 

At present, India has a trickle-down 
strategy, in which elite institutions are 


supported in the hope that good science 
there will energize the masses, and a bottom- 
up approach, in which the general public is 
targeted with schemes to popularize science. 

These approaches have converged with 
the setting up in recent years of tens of new 
universities, institutes and centres of higher 
learning, even though many hundreds more 
are desirable for a country of India’s size. 
Although there was, curiously, no increased 
allocation to science in this year’s Indian 
budget, there is hope that, as the prime min- 
ister has declared, things would improve if 
government support were increased to 2% 
of the gross domestic product (it now stands 


© 2012 Macmillan Publishers Limited. All rights reserved 


at 0.9%). But it is a haphazard plan, with no 
hint of new strategies. The assumption is 
that the answer to our problems lies simply 
in more money. 

As someone who has worked in India 
for 34 years, I am impatient with our slow 
progress’. At the glitzy level, we have had 
no Nobel prize winner since C. V. Raman 
in 1930, no highly Shanghai-ranked uni- 
versity, no miracle drug for a tropical dis- 
ease and no sequencing of the rice genome. 
At the industrial level, there have been no 
breakthroughs to rival the telephone, the 
transistor or Teflon. At the organizational 
level, we do not have a postdoctoral > 
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> system worth its name, and our under- 
graduate teaching system is in a shambles. 
We figure occasionally in the best journals, 
yet we tolerate plagiarism, misconduct and 
nepotism. And yet, the innate abilities and 
talents of India are palpable. Why is it that 
this country has not been able to harness its 
strengths into deliverables? 

Money is not the primary constraining 
factor in our problems, nor will an abun- 
dance of it solve them. More money is 
undoubtedly better, but if there are deep cul- 
tural and social problems, extra money will 
simply drain away. The rate of any improve- 
ment will not match the rate of increase in 
investment’. Big problems in big countries 
usually emanate from a small number of core 
reasons. An understanding of these reasons 
in the context of Indian science should stem 
from an appreciation of the country’s histori- 
cal, economic and sociological profile. 

It is not enough for the prime minister to 
resort to platitudes by saying (as in his recent 
speech) that “things are changing but we can- 
not be satisfied with what has been achieved”, 
or that we should make “scientific output 
more relevant”. He and his advisers must ask 
themselves if there are underlying causes for 
this lack of satisfaction and relevance. Until 
then, no amount of bankrolling, populism, 
bureaucrat bashing or whistle-stop tours by 
prominent Western scientists will help. 


AFEUDAL MINDSET 

Two aspects of the Indian psyche are 
particularly troubling for a country seek- 
ing its rightful place in the modern world. 
Our cultural value system, backed by Hindu 
scriptural authority, has created a strongly 
feudal mindset among Indians. Centuries 
of servitude, right up until 1947, have 
made the average Indian docile, obedient 
and sycophantic. ‘Behave yourself and be 
rewarded; is the pragmatic mantra. I believe 
this feudal—-colonial mentality has had far- 
reaching and debilitating consequences for 
research. 

The first is our lack of the ability to 
question and dissent that is so essential to 
science. Most of the faculty in our better 
institutions have done postdoctoral workin 
a foreign laboratory of consequence. Unlike 
young scientists in advanced countries, 
however, newly returned Indian lecturers 
typically relive their golden moments as 
postdocs throughout their research careers. 
The best research papers from India may 
be competent, but they do not inspire or 
excite. Very few Indian scientists are known 
as opinion-makers, trend-setters or leaders. 
They follow obediently. 

Another consequence of this feudal 
mindset is our unquestioning accept- 
ance — bordering on subservience — to 
older people. In this part of the world, 
age is blindly equated with wisdom, and 


160 | NATURE | VOL 484 | 12 APRIL 2012 


youth with immaturity. This facilitates the 
continuance of the status quo. Geriatric 
individuals with administrative and political 
clout reinforce their positions so well that we 
are unable to eject them. So we hail scientists 
in their eighties, film actors in their seventies 
and cricketers in their forties. 


VARIANTS OF CORRUPTION 

In healthy organizational hierarchies, the 
decision-makers are also active participants 
who have a stake in the future. We will have 
come of age only when Indian universities 
are allowed to appoint their own vice- 
chancellors, and institutes and national 
laboratories their own directors, rather than 
suffer the choices made by conclaves of old 
men in New Delhi. 

The most important decisions in an 
academic system concern the appoint- 
ment of faculty. This procedure is flawed 
in India. For a start, selection committees 
consist mostly of outsiders, and represen- 
tation from within is often restricted to 
institutional and departmental heads. In 
the smaller state universities, all sorts of 
irregularities occur in the name of caste- 
based reservations. In the more influen- 
tial central institutions, appointments are 
often made incestuously, with students of 
a few senior researchers filling a dispropor- 
tionately large num- 
ber of vacancies, or 
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laboratory it can mean 
acquiescing to the notion that one’s admin- 
istrative head is also one’s scientific superior. 
By that logic, and given our civilian-based 
system of administration, the secretaries in 
the science ministries in Delhi should be our 
most creative scientists. 

These variants of corruption — along 
with general indifference, absence of incisive 
introspection, old-boys’ networks, admin- 
istrative vindictiveness, vagaries in research 
funding and studied silences — conspire to 
create an atmosphere that lacks innovation 
and creativity. Impact factors and h-indi- 
ces become the sole arbiters of scientific 
excellence in such an environment. If policy- 
makers are ignoring cultural parameters, 
scientists are looking only at numerical 
parameters. 

The true measures of a country’s scien- 
tific strength are found in the numbers of 
competent teachers and lively students 
in schools and undergraduate colleges, 
because these translate into real gains in the 
future. Fluffy factors, such as the numbers of 
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articles in Nature and Science, do not tell the 
real story. As a chemist, I would say that it is 
better to move deliberately and confidently 
towards the thermodynamic minimum (a 
system's most stable state, which has the 
lowest energy but is not always the easiest to 
achieve) rather than flit anxiously between 
any number of metastable kinetic states 
(which are easy and fast to access but have 
higher energies). 

A large country with a well developing 
economy can afford this long-term strategy 
and vision. China need not be a comparison 
point’ — India is endowed enough to seek its 
own solutions for its problems”. 


THE WAY FORWARD 

I suggest that our policy-makers consider 
the following. First, provide modest funding 
to a very large number of small, single- 
investigator, blue-sky projects — including 
those in state universities — to achieve a 
critical density ofideas and a feeling of mass 
participation and enthusiasm. 

Second, provide heavy and directed fund- 
ing into a few specific projects of national 
importance — such as energy, water and 
public health — with high levels of account- 
ability and proper exit options. Third, 
reduce or abolish the present system of 
awards, prizes and recognitions in higher- 
level science. This would dissuade younger 
scientists from chasing awards rather than 
doing good science, and it would reduce the 
influence of the cliques who allocate prizes. 

To reach a stable solution, we can employ 
longer-term measures that include modifi- 
cation or removal of caste-based quotas and 
reservations in the educational and research 
sectors; improvement of undergraduate 
teaching institutions and teaching laborato- 
ries with respect to greater uniformity and 
transparency; and clear identification of 
paths towards scientific and administrative 
growth for individuals. 

Money is neither the cause nor the 
solution to our problems, although it can 
facilitate progress in an otherwise healthy 
climate. What is lacking in India is the 
quality of leadership and the level of honesty 
that are required for a breakthrough. When 
will this country see another C. V. Raman? m 


Gautam R. Desiraju is a professor of 
chemistry in the Indian Institute of Science, 
Bangalore 560 012, India. He is president of 
the International Union of Crystallography. 
e-mail: desiraju@sscu.iisc.ernet.in 
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Rapid economic growth means that the air in some Chinese cities, such as Beijing, contains more fine particles than the World Health Organization recommends. 


Cleaning China’s air 


To reduce airborne soot, organics and sulphates, tailored strategies for each must 
be established and coal use limited, say Qiang Zhang, Kebin He and Hong Huo. 


n29 February this year, China's State 
() Council approved its first national 

environmental standard for limiting 
the amount of fine particles in air that meas- 
ure less than 2.5 micrometres in diameter. It 
requires the country to implement the World 
Health Organization’s recommended interim 
target of an annual average of 35 micrograms 
per cubic metre (gm *) for such particles by 
the end of 2015. 

Fine particles — including soot, organics 
and sulphates — have a severe effect on 
human health and are implicated in climate 
change. They are emitted by combustion 
and industrial processes, and formed from 
the reactions of gaseous pollutants. Ifimple- 
mented properly, China’s air-quality standard 
would have far-reaching benefits: as well as 
protecting human health, it would reduce 
air and mercury pollution in the Northern 
Hemisphere and slow global warming. 

Achieving this goal will be a challenge. 
Some Chinese cities currently have fine- 
particle concentrations that are well above 
the proposed standard: levels of more than 
100 ug m~* have been reported’. To meet 
the ambitious air-quality limits, China will 
have to overcome two major hurdles: its 
relentless increase in fossil-fuel use, which 
quickly wipes out any efforts to reduce 
emissions, and its decentralized system of 
environmental enforcement, which gives 


undue influence to local officials who favour 
economic development. 

Controlling air quality in China will 
address global environmental issues. For 
example, pollutants from east Asia that 
travel across the Pacific increase ozone con- 
centration in the western United States’. 
This could be relieved if China reduces 
emissions of nitrogen oxides (NO,), which 
are precursors of fine particles and ozone. 
Cross-border pollution by airborne particles 
would similarly be reduced by cutting Chi- 
nas emissions of sulphur dioxide (SO,). The 
use of technologies such as desulphurization, 
selective catalytic reduction or electrostatic 
precipitators to reduce Chinese emissions of 
SO,, NO, and fine particles, respectively, can 
remove global pollutants such as mercury, 
which is released by the burning of coal. 

Limiting particle pollution will also affect 
drivers of climate change — but not always 
for the better. On the one hand, reducing 
soot emissions by cutting coal use or using 
cleaner stoves will lessen radiative forcing 
and thus limit warming, benefiting both the 
climate and public health’. A stricter emis- 
sions standard for diesel vehicles, which pro- 
duce soot, is another win-win solution*. On 
the other hand, reductions in SO, emissions 
from power plants would reduce atmospheric 
sulphate concentrations, thereby increasing 
radiative forcing, which has a short-term 
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detrimental effect on climate’. Thought 
is therefore needed as to how the various 
pollutants and sources should be best con- 
trolled, and a multi-pollutant abatement 
strategy must be developed. 


A CLEAN CHALLENGE 

The control of air pollution in China is in 
a race with the economy. The country has 
maintained an annual economic growth 
rate of more than 8% for years, largely 
through the energy-intensive construction 
of infrastructure such as highways, railways 
and cities. Between 2005 and 2010, China 
increased its thermal-power generation 
by 63%, pig-iron and cement production 
by 74% and 76%, respectively, and vehicle 
production by 220% (ref. 5). 

Although China has made tremendous 
efforts to limit air pollution, such as requir- 
ing coal-fired power plants to install flue-gas 
desulphurization systems and strengthening 
vehicle-emissions standards, these meas- 
ures have not kept up with the growth of its 
economy and fossil-fuel use. We estimate that 
new equipment reduced SO, emissions from 
China’s power plants by 1.5 million tonnes in 
2005 and by 17.5 million tonnes in 2010 — 
54% of the country’s total SO, emissions in 
2005 (32.3 million tonnes). But nationwide, 
total SO, emissions only decreased by 11% 
(to 28.7 million tonnes in 2010) because 
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those from other sectors grew (see ‘China's 
emissions battle’). Coal usage rose by 44% 
(955 million tonnes), more than one-third of 
which was consumed by industrial facilities 
(such as iron, steel and cement works) that 
have no desulphurization systems. 

The low priority given to environmental 
protection and the lack of cooperation among 
various government agencies also hampers 
air-quality control. China's Ministry of Envi- 
ronmental Protection (MEP) manages pol- 
lutant discharge, but it is a weak player within 
the government system. Its decisions are 
often obstructed for economic reasons. For 
instance, in late 2011, Chinese oil companies 
caused a delay in the planned 2012 implemen- 
tation ofa stricter vehicle-emissions standard 
(equivalent to Europe's Euro IV standard) 
because they were unable to provide the 
necessary low-sulphur oil. Any delay is a big 
strike against the environment, particularly as 
vehicle emissions continue to rise. 

Such stories are rife in local governments, 
which like to promote heavy industry to 
stimulate regional economies. Local environ- 
mental agencies are often forced to back these 
projects just because they are affiliated to local 
governments. Yet, pushed by the public, the 
willingness and enthusiasm of China's gov- 
ernment for curbing air pollution has never 
been so strong. It is a golden opportunity for 


CHINA'S EMISSIONS BATTLE 


Rising coal use (top) has increased emissions of 
sulphur dioxide from some sectors, even though 
power plants emit less than in 2005 (bottom). 
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the nation to make a change — to free itself 
from the trap between economic develop- 
ment and environmental pollution. 


CONSTRAIN COAL 

Because China will continue to rely on fossil 
fuels for the next 20 years, the government 
should change its thinking. Instead of try- 
ing to use more energy to ensure economic 
growth regardless of the consequences, it 
should promote development with con- 
strained fossil-fuel use. A cap could be set 
for national total coal consumption, and 
economic plans developed under this con- 
straint. Otherwise, emissions from increased 
energy use will offset any gains from emis- 
sions control. Tertiary service industries and 
high-technology projects could be promoted 
instead of energy-intensive ones. 

Greater authority should be given to 
environmental agencies at various levels of 
government. The MEP should be granted 
more power to implement its policies and 
enforce regulations. A vertical administrative 
structure would ensure that local environ- 
mental agencies report directly to the MEP. 

The impact on the global environment 
should be considered when formulating 
China's air-quality strategies, and balanced 
plans should be developed for each pollution 
source. As well as reducing SO, emissions, 
the government should endorse measures to 
limit soot. Controlling emissions from diesel 
vehicles should be a priority, and oil compa- 
nies should be brought into accordance with 
environmental standards. For future facili- 
ties that will control SO, and NO, emissions, 
the government should equip them with 
specific technologies to remove mercury, 
such as activated carbon injection. 

Addressing air pollution in China is a 
unique platform for researchers in atmos- 
pheric chemistry. Many scientific issues — 
such as secondary organic aerosol formation 
— remain to be explored. Practical control 
technologies for ultrafine particles and vola- 
tile organic compounds must be developed. 
Multinational collaboration is urgently 
needed; the government should make funds 
available to bring outstanding international 
scientists to China to help combat its air- 
pollution challenges. We all stand to benefit. m 


Qiang Zhang, Kebin He and Hong Huo 
are professors in atmospheric chemistry, 
environmental science and energy systems at 
Tsinghua University, Beijing, China. 
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What is Skios about? 

It is a farce about an institute on a Greek 
island that has invited a lecturer to talk about 
the organization of science. The wrong lec- 
turer shows up. I light-heartedly touch on 
determinism — the old question of whether 
human contributions to events are predeter- 
mined or whether they can’t be understood 
in the context of causality. My view is the lat- 
ter. There are two interesting things about 
farce. One is seeing human beings reduced 
to the level of machines, unable to control 
situations. The other is seeing people desper- 
ately thinking of ways to cope with difficult 
situations, inventing lies that they hope will 
get them out of the difficulties they're in, and 
of course making their difficulties worse. 


Are you poking fun at the idea that human 
thought can be organized? 

A tiny bit. People are endlessly surprised by 
the imagination. I’m struck by something 
that comes into Copenhagen: how physi- 
cists Otto Frisch and 
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Q&A Michael Frayn 
The playful dramatist 


Author and playwright Michael Frayn explores the wellsprings of creativity through farce, philosophy and the history of science. His eclectic output 
ranges from non-fiction books such as The Human Touch (2006) to plays including Noises Off (1982) and Copenhagen (1998) — which explores 
the 1941 meeting between quantum physicists Niels Bohr and Werner Heisenberg, with Frayn imagining their discussions on the morality of 
working on nuclear weapons. With his latest novel, Skios, coming out next month, he talks about determinism and the paradox of existence. 


nuclear chain reaction. 
It was because they 
were messing around. 
Everyone assumed 
that you would need 
tonnes, and there 
wasn't a chance of pro- 
ducing enough. One 
of the pair realized 
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of all speeds. And he 

worked one out. Then the other one said, sup- 
posing we did have as much fissile material 
as we wanted, how much would be needed? 
And he applied the formula and discovered 
that it would be a matter of kilograms. These 
researchers weren't addressing the specific 
problem of building a bomb — they were 
working off the tops of their heads. 


Why did you start writing about science? 

I studied philosophy at university, and 
couldn't help but come across scientific 
questions, particularly in connection with 
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quantum mechanics. I had always had a 
faint idea of Heisenberg and Bohr’s work, but 
never thought of writing about it until I read 
Heisenberg’s War by Thomas Powers. Why 
did Heisenberg go to Copenhagen in 1941, 
and what were his motives in working on the 
German nuclear programme? Was he actu- 
ally trying to build a bomb? Although there 
is obviously no parallel between this uncer- 
tainty and Heisenberg’s uncertainty princi- 
ple in quantum mechanics, there is a similar 
impossibility of ever reaching beyond a cer- 
tain point. The result was Copenhagen — a 
play about epistemology that happens to be 
played out in terms of science. 


Can drama teach science? 

I don't think the theatre is a very good 
medium for explaining complex ideas. No 
one ignorant of nuclear physics would come 
out of Copenhagen thinking that they under- 
stood it. 


In The Human Touch, you write about how 
the mind constrains our understanding of 
the world. 

It is a paradox: we know perfectly well that 
were irrelevant to the process of the > 
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> Universe — but there is nothing we 
can say about the Universe except in 
terms of what we see and think. I’m not 
suggesting that we make it all up arbitrar- 
ily. Were constrained by something, but 
it is extremely difficult to say what it is. 


Some scientists would argue with that. 

I can see how resistant scientists are to 
that side of the paradox. I was invited to 
CERN near Geneva, Switzerland, to talk 
about The Human Touch, and it was really 
daunting. They had appointed a jury that 
asked detailed questions. One of the jury 
members said beforehand: “We're going 
to haul you over the coals.” It seemed 
to me — although they were all very 
charming and friendly about it — that 
they were unreconstructed Platonists. 
They believed that numbers and the laws 
of science are objective entities, whereas 
I think that they are constructs that we 
place on the world to understand it. 


As anon-scientist, are you confident in 
writing about science? 

Fortunately, professional science writ- 
ers and scientists have made enormous 
efforts to get through to lay audiences. 
But people like the physicist Richard 
Feynman insist that if you haven't got 
mathematics you're never really going 
to understand physics — it is like try- 
ing to explain music to the tone-deaf. I 
made a lot of mistakes writing Copen- 
hagen, in spite of getting the text read. I 
got letters from scientists pointing out 
basic errors. But I was struck by their 
generous tone. 


How do you approach writing? 

As a writer, you can't think, “Id like to 
write a play about stem-cell research and 
there will be these characters.” It doesn’t 
work like that: ideas just seem to fall into 
your head out of nowhere, and develop 
of their own accord. So there is resonance 
with the case of Peierls and Frisch, or the 
chemist August Kekulé dreaming about 
the structure of the benzene ring. There 
is an unconscious leap, a synthesis, that 
goes on, even though much science is 
about trying to find a specific answer to 
a specific problem. 


So playwrights run experiments too? 
Plays are called plays for a good reason. 
Asa playwright, you are saying, what if 
we had enough uranium-235, or what if 
somebody discovered that their brother 
was their father, and you take over from 
these fictitious situations. It is messing 
around, but messing around often has 
serious results. m 
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Thomas Kuhn recognized the importance of revolutionary changes, or ‘paradigm shifts’, in science. 


IN RETROSPECT 


The Structure of 
Scientific Revolutions 


David Kaiser marks the 50th anniversary of an 
exemplary account of the cycles of scientific progress. 


under the intriguing title The Structure 
of Scientific Revolutions. Its author, 
Thomas Kuhn (1922-1996), had begun his 
academic life as a physicist but had migrated 
to the history and philosophy of science. His 
main argument in the book — his second 
work, following a study of the Copernican 
revolution in astronomy — was that scien- 
tific activity unfolds according to a repeating 
pattern, which we can discern by studying 
its history. 
Kuhn was not at all confident about how 
Structure would be received. He had been 


F ifty years ago, a short book appeared 
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The Structure of 
Scientific Revolutions: 
50th Anniversary 
Edition 

THOMAS S. KUHN (WITH 
AN INTRODUCTION BY IAN 
HACKING) 

Univ. Chicago Press: 2012. 
264 pp. $45, £29 


denied tenure at Harvard University in Cam- 
bridge, Massachusetts, a few years before, 
and he wrote to several correspondents after 
the book was published that he felt he had 
stuck his neck “very far out”. Within months, 
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terms of what we see and think. I’m not 
suggesting that we make it all up arbitrar- 
ily. Were constrained by something, but 
it is extremely difficult to say what it is. 


Some scientists would argue with that. 

I can see how resistant scientists are to 
that side of the paradox. I was invited to 
CERN near Geneva, Switzerland, to talk 
about The Human Touch, and it was really 
daunting. They had appointed a jury that 
asked detailed questions. One of the jury 
members said beforehand: “We're going 
to haul you over the coals.” It seemed 
to me — although they were all very 
charming and friendly about it — that 
they were unreconstructed Platonists. 
They believed that numbers and the laws 
of science are objective entities, whereas 
I think that they are constructs that we 
place on the world to understand it. 


As anon-scientist, are you confident in 
writing about science? 

Fortunately, professional science writ- 
ers and scientists have made enormous 
efforts to get through to lay audiences. 
But people like the physicist Richard 
Feynman insist that if you haven't got 
mathematics you're never really going 
to understand physics — it is like try- 
ing to explain music to the tone-deaf. I 
made a lot of mistakes writing Copen- 
hagen, in spite of getting the text read. I 
got letters from scientists pointing out 
basic errors. But I was struck by their 
generous tone. 


How do you approach writing? 

As a writer, you can't think, “Id like to 
write a play about stem-cell research and 
there will be these characters.” It doesn’t 
work like that: ideas just seem to fall into 
your head out of nowhere, and develop 
of their own accord. So there is resonance 
with the case of Peierls and Frisch, or the 
chemist August Kekulé dreaming about 
the structure of the benzene ring. There 
is an unconscious leap, a synthesis, that 
goes on, even though much science is 
about trying to find a specific answer to 
a specific problem. 


So playwrights run experiments too? 
Plays are called plays for a good reason. 
Asa playwright, you are saying, what if 
we had enough uranium-235, or what if 
somebody discovered that their brother 
was their father, and you take over from 
these fictitious situations. It is messing 
around, but messing around often has 
serious results. m 
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IN RETROSPECT 


The Structure of 
Scientific Revolutions 


David Kaiser marks the 50th anniversary of an 
exemplary account of the cycles of scientific progress. 


under the intriguing title The Structure 
of Scientific Revolutions. Its author, 
Thomas Kuhn (1922-1996), had begun his 
academic life as a physicist but had migrated 
to the history and philosophy of science. His 
main argument in the book — his second 
work, following a study of the Copernican 
revolution in astronomy — was that scien- 
tific activity unfolds according to a repeating 
pattern, which we can discern by studying 
its history. 
Kuhn was not at all confident about how 
Structure would be received. He had been 


F ifty years ago, a short book appeared 
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The Structure of 
Scientific Revolutions: 
50th Anniversary 
Edition 

THOMAS S. KUHN (WITH 
AN INTRODUCTION BY IAN 
HACKING) 

Univ. Chicago Press: 2012. 
264 pp. $45, £29 


denied tenure at Harvard University in Cam- 
bridge, Massachusetts, a few years before, 
and he wrote to several correspondents after 
the book was published that he felt he had 
stuck his neck “very far out”. Within months, 
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however, some people were proclaiming a 
new era in the understanding of science. 
One biologist joked that all commentary 
could now be dated with precision: his own 
efforts had appeared “in the year 2 B.K’, 
before Kuhn. A decade later, Kuhn was so 
inundated with correspondence about the 
book that he despaired of ever again getting 
any work done. 

By the mid-1980s, Structure had 
achieved blockbuster status. Nearly a mil- 
lion copies had been sold and more than a 
dozen foreign-language editions published. 
The book became the most-cited academic 
work in all of the humanities and social sci- 
ences between 1976 and 83 — cited more 
often than classic works by Sigmund Freud, 
Ludwig Wittgenstein, Noam Chomsky, 
Michel Foucault or Jacques Derrida. The 
book was required reading for undergrad- 
uates in classes across the curriculum, from 
history and philosophy to sociology, eco- 
nomics, political science and the natural 
sciences. Before long, Kuhn’s phrase “par- 
adigm shift” was showing up everywhere 
from business manuals to cartoons in The 
New Yorker. 

Kuhn began thinking about his project 15 
years before it was published, while he was 
working on his doctorate in theoretical phys- 
ics at Harvard. He became interestedin > 


Books in brief 


Experiment Eleven: Deceit and Betrayal in the Discovery of the 
Cure for Tuberculosis 

Peter Pringle BLOOMSBURY 288 pp. £18.99 (2012) 

The 1943 discovery of a drug treatment for tuberculosis did much 
to kick-start big pharma. But this is a knotted tale, deftly unpicked 
by investigative journalist Peter Pringle. We learn that Albert Schatz, 
a US graduate student, found streptomycin in the eponymous 
11th experiment on a farmyard bacterium — but that his research 
director, Selman Waksman, took the credit, along with patent 
royalties and a Nobel prize. A chance rediscovery brought Schatz 
the reputation he deserves. 


The Forever Fix: Gene Therapy and the Boy Who Saved It 

Ricki Lewis ST MARTIN'S 336 pp. $25.99 (2012) 

This popularized examination of gene therapy hinges on a 
breakthrough case: Corey Haas’s recovery from Leber’s congenital 
amaurosis type 2, which had made him virtually blind at the age 

of eight. Medical writer Ricki Lewis interweaves science, the history 
of medical trial and error, and human stories. The contrast can be 
intense, running from the death in 1999 of teenager Jesse Gelsinger, 
from a reaction to gene therapy intended to combat his liver disease, 
to radical successes in some children with adenosine deaminase 
deficiency. 


RICK! Lewis 


Internal Time: Chronotypes, Social Jet Lag, and Why You’re 

So Tired 

Till Roenneberg HARVARD UNIVERSITY PRESS 288 pp. $26.95 (2012) 
Time really is of the essence, says medical psychologist Till 
Roenneberg. By neglecting our body clocks — which rarely run in 
synchrony with the crazily cranked-up pace of modern life — we 
can develop “social jetlag”, endangering our health and careers. 
Roenneberg has built his book on decades of research in everything 
from fungi and single-celled organisms to humans. In brilliantly 
minimalist terms, he explains the temporal mismatches behind teen 
exhaustion, early birds and night owls, and sleep phobia. 


Why Animals Matter: Animal consciousness, animal welfare, and 
human well-being 

Marian Stamp Dawkins OXFORD UNIVERSITY PRESS 224 pp. £16.99 (2012) 
Too little science and too much anthropomorphism have made 

our approaches to animal welfare a shambles, says ethologist 
Marian Stamp Dawkins. Her radical rethink involves linking their 
welfare with our own to harness a powerful driver of change: human 
self-interest. Dawkins advises sidestepping the question of animal 
consciousness to focus on animal health and hard-wired ‘wants’ 
such as foraging, to benefit both groups. Also key is never letting up 
on research into our intertwined existences, she says. 


Subliminal: The Revolution of the New Unconscious and What it 
Teaches Us About Ourselves 

Leonard Mlodinow PANTHEON 272 pp. $25.95 (2012) 

Perception “below the threshold of consciousness”, as Carl Jung 
put it, is here pushed into the limelight. Physicist Leonard Mlodinow 
shows how humans have “parallel tiers” of a conscious brain 
superimposed on an unconscious mind. Drawing on research and 
anecdotes, Mlodinow explores the pattern-matching, gap-filling role 
of the unconscious in perception, memory, sociality, emotions and 
self-estimation. An illuminating journey through a hidden world. 
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The duck-rabbit figure shows how two pictures can be derived from the same evidence. 


> developmental psychology, avidly read- 
ing works by Swiss psychologist Jean Piaget 
about the stages of cognitive development 
in children. 

Kuhn saw similar developmental stages 
in entire sciences. First, he said, a field of 
study matures by forming a paradigm — a 
set of guiding concepts, theories and meth- 
ods on which most members of the relevant 
community agree. There follows a period of 
“normal science’, during which researchers 
further articulate what the paradigm might 
imply for specific situations. 

In the course of that work, anomalies 
necessarily arise — findings that differ 
from expectations. Kuhn had in mind 
episodes such as the accidental discover- 
ies of X-rays in the late nineteenth century 
and nuclear fission in the early twentieth. 
Often, Kuhn argued, the anomalies are 
brushed aside or left as problems for future 
research. But once enough anomalies have 
accumulated, and all efforts to assimilate 
them to the paradigm have met with frus- 
tration, the field enters a state of crisis. 
Resolution comes only with a revolution, 
and the inauguration of a new paradigm 
that can address the anomalies. Then the 
whole process repeats with a new phase of 
normal science. Kuhn was especially struck 
by the cyclic nature of the process, which 
ran counter to then-conventional ideas 
about scientific progress. 

At the heart of Kuhn’s account stood the 
tricky notion of the paradigm. British phi- 
losopher Margaret Masterman famously 
isolated 21 distinct ways in which Kuhn 
used the slippery term throughout his slim 
volume. Even Kuhn 


himself came to real- NATURE.COM 
ize that he had sad- For moreonthe 
dled the word withtoo _ socialpractices of 
much baggage: inlater science: 

essays, he separated _ go.natuire.com/ghyilli 


166 | NATURE | VOL 484 | 12 APRIL 2012 


his intended meanings into two clusters. 
One sense referred to a scientific communi- 
ty’s reigning theories and methods. The sec- 
ond meaning, which Kuhn argued was both 
more original and more important, referred 
to exemplars or model problems, the worked 
examples on which students and young sci- 
entists cut their teeth. As Kuhn appreciated 
from his own physics training, scientists 
learned by immersive apprenticeship; they 
had to hone what Hungarian chemist and 
philosopher of sci- 


“Scientists ence Michael Polanyi 
have no way had called “tacit 
to compare knowledge” by work- 
concepts on ing through large col- 
either side of lections of exemplars 
ascientific rather than by memo- 


revolution.” rizing explicit rules or 
theorems. More than 
most scholars of his era, Kuhn taught his- 
torians and philosophers to view science as 
practice rather than syllogism. 

Most controversial was Kuhn's claim that 
scientists have no way to compare concepts 
on either side of a scientific revolution. For 
example, the idea of ‘mass’ in the Newtonian 
paradigm is not the same as in the Einstein- 
ian one, Kuhn argued; each concept draws 
meaning from separate webs of ideas, prac- 
tices and results. If scientific concepts are 
bound up in specific ways of viewing the 
world, like a person who sees only one 
aspect of a Gestalt psychologist’s duck—rab- 
bit figure, then how is it possible to com- 
pare one concept to another? To Kuhn, the 
concepts were incommensurable: no com- 
mon measure could be found with which to 
relate them, because scientists, he argued, 
always interrogate nature through a given 
paradigm. 

Perhaps the most radical thrust of Kuhn's 
analysis, then, was that science might not be 
progressing toward a truer representation of 
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the world, but might simply be moving away 
from previous representations. Knowledge 
need not be cumulative: when paradigms 
change, whole sets of questions and answers 
get dropped as irrelevant, rather than incor- 
porated into the new era of normal science. 
In the closing pages of his original edition, 
Kuhn adopted the metaphor of Darwinian 
natural selection: scientific knowledge surely 
changes over time, but does not necessarily 
march towards an ultimate goal. 

And so, 50 years later, we are left with 
our own anomaly. How did an academic 
book on the history and philosophy of sci- 
ence become a cultural icon? Structure was 
composed as an extended essay rather than 
a formal monograph: it was written as an 
entry on the history of science for the soon- 
to-be-defunct International Encyclopedia of 
Unified Science. Kuhn never intended it to 
be definitive. He often described the book 
(even in its original preface) asa first pass at 
material that he intended to address in more 
detail later. 

To me, the book has the feel of a physicist’s 
toy model: an intentionally stripped-down 
and simplified schematic — an exemplar — 
intended to capture important phenomena. 
The thought-provoking thesis is argued 
with earnestness and clarity, not weighed 
down with jargon or lumbering footnotes. 
The more controversial claims are often 
advanced in a suggestive rather than declar- 
ative mode. Perhaps most important, the 
book is short: it can be read comfortably in 
a single sitting. 

For the 50th-anniversary edition, the 
University of Chicago Press has included an 
introductory essay by renowned Canadian 
philosopher Ian Hacking. Like Kuhn, Hack- 
ing has a gift for clear exposition. His intro- 
duction provides a helpful guide to some of 
the thornier philosophical issues, and gives 
hints as to how historians and philosophers 
of science have parted with Kuhn. 

The field of science studies has changed 
markedly since 1962. Few philosophers still 
subscribe to radical incommensurability; 
many historians focus on sociological or cul- 
tural features that received no play in Kuhn's 
work; and topics in the life sciences now 
dominate, whereas Kuhn focused closely on 
physics. Nevertheless, we may still admire 
Kuhn’s dexterity in broaching challenging 
ideas with a fascinating mix of examples 
from psychology, history, philosophy and 
beyond. We need hardly agree with each of 
Kuhn’s propositions to enjoy — and benefit 
from — this classic book. m 


David Kaiser is Germeshausen Professor of 
the History of Science at the Massachusetts 
Institute of Technology in Cambridge. 

His latest book is How the Hippies Saved 
Physics (Norton, 2011). 

e-mail: dikaiser@mit.edu 
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Primate studies: hear 
the public’s views 


A painful irony in the disrupted 
flow of primates to US research 
labs (Nature 483, 381-382; 2012) 
is that the number being used in 
laboratory experiments is at an 
all-time high. 

If the scientific community is to 
maintain the support and trust of 
the public, which funds much of 
its work, then research practices 
and policies should change to 
reflect society's views on what 
constitutes the ethical treatment 
of animals. These changes need to 
be speeded up. 

Committees that review and 
approve animal experiments 
at US facilities should not be 
dominated by those who work 
in animal labs and have vested 
interests in continuing animal 
research (L. A. Hansen et al. 
Animals 2, 68-75; 2012). There 
are too few members of the public 
on these US committees, and 
those who are involved say that 
they are often marginalized. In 
other countries, such as Sweden 
and Australia, half or one-third 
of committees must comprise 
non-scientists and animal-welfare 
representatives. 

If scientists continue to 
disregard the substantial and 
growing public opposition to 
harmful research on primates 
and other animals, more protest 
campaigns are inevitable. 
Lawrence A. Hansen University 
of California, San Diego, La Jolla, 
California, USA. 
lahansen@ucsd.edu 


Primate studies: fix 
welfare issues first 


Record numbers of non-human 
primates are being used in US 
labs, so it is unlikely that limiting 
imports will hold back vital areas 
of research as you imply (Nature 
483, 381-382; 2012). 

A report from the American 
Anti-Vivisection Society 
(AAVS; see go.nature.com/ 
gbqlel) indicates that imports of 
monkeys born to wild-caught 


parents quadrupled during 
1998-2008. Conservationists are 
concerned about global trade in 
crab-eating macaques (Macaca 
fascicularis), the import of which 
has doubled in recent years. 

Scientists must urgently 
address the extreme animal- 
welfare issues surrounding these 
imports. The AAVS report, 
which is based on information 
from US federal agencies and 
scientific studies, has revealed 
that monkeys destined for US labs 
typically endure long, gruelling 
air and land transportation; entire 
groups have been killed after 
quarantine on testing positive 
for tuberculosis; many die from 
transport injuries or stress in 
quarantine; and survivors show 
negative physiological and 
behavioural effects for several 
months after the journey. 

More airlines are likely to back 
away from a dirty job that they 
are ill-equipped to do properly. 
Crystal Miller-Spiegel American 
Anti-Vivisection Society, 
Jenkintown, Pennsylvania, USA. 
cmillerspiegel@aavs.org 


Primate studies: trials 
don’t always translate 


In your discussion on the 
campaign against animal research 
(Nature 483, 373-374; 2012), you 
mention a study in macaques 
that has moved into early clinical 
trials in humans, with promising 
results. Sadly, there is a yawning 
chasm between early promise in 
trials and efficacy. 

The US Food and Drug 
Administration reports that 
more than 90% of trials fail (see 
go.nature.com/h2365q), even 
though the treatments tested, 
by definition, were considered 
sufficiently efficacious and safe 
in animals to merit a clinical- 
trial licence. 

Many other treatments to 
protect the brain after stroke have 
failed in humans, despite success 
in rodent and primate trials 
(V.E. Collins et al. Ann. Neurol. 
59, 467-477; 2006). None of the 
85 or so candidate HIV vaccines 


that were effective in primates has 
so far worked in humans (J. Bailey 
Altern. Lab. Anim. 36, 381-428; 
2008). The monoclonal antibody 
that caused severe inflammatory 
reactions in a 2006 clinical trial 
at Northwick Park Hospital, 
London, caused no problems in 
primates at 500 times the dose 
given to the human volunteers. 
The public is rightly concerned 
about the transportation of 
primates for questionable 
experimental purposes. These 
cannot justify the degree of 
suffering involved during 
capture, in breeding and holding 
facilities and during lengthy 
transportation (see go.nature. 
com/svbuvj). 
Michelle Thew British Union 
for the Abolition of Vivisection, 
London, UK. 
michelle.thew@buav.org 


Higgs can claim name 
of massive boson 


Attempts to rule against naming 
the Higgs boson after physicist 
Peter Higgs suggest that political 
correctness is taking over from 
scholarship (Nature 483, 374; 
2012). Your suggestion that the 
name Higgs should be retained 
for reasons akin to business 
branding is hardly better. Higgs 
has a unique claim to the massive 
boson in question. 

My book The Infinity Puzzle 
(Oxford Univ. Press, 2011) 
covers the history of the Higgs 
hypothesis in detail. It is true that 
Higgs is one of several theorists 
who, in 1964, independently 
discovered how to give mass 
to fundamental particles, and 
that it would be inappropriate 
to refer to the hypothesis of 
mass generation as the ‘Higgs 
mechanism. However, it 
was Higgs alone who drew 
attention to the massive boson 
whose detection can prove the 
hypothesis. So naming the boson 
after him, as Ben Lee did in 1972, 
is justifiable. 

Frank Close University of 
Oxford, Oxford, UK. 
fclose1@physics.ox.ac.uk 


More credit due to 
India’s scientists 


Any increase in India’s science 
budget for 2012-13 is likely 
to be wiped out by a 5-10% 
rise in the cost of research 
commodities, owing to the 
country’s high rate of inflation 
(Nature 483, 384; 2012). Neither 
will the modest extra funding 
tackle the glut of unemployed 
science PhD graduates (for 
example, around 60% of female 
science PhDs do not have a 
research position). 
The reasons for this situation 
are not just economic. In 
my opinion, India’s policy- 
makers are failing to recognize 
scientists’ achievements. In 
May last year, for example, 
environment minister 
Jairam Ramesh intimated 
that India’s elite institutions, 
which include the Indian 
Institutes of Technology and 
of Management, fall short of 
world-class standards; the head 
of the prime minister’s scientific 
advisory council, C. N. R. Rao, 
seems to agree with this view 
(see go.nature.com/snnbbt). 
Yet India is ranked 11th 
in the world by number of 
publications and 16th on the 
basis of total citations during 
the past 10 years. Many of these 
publications were in leading 
international journals. Instead 
of squandering this talent, the 
government should provide the 
incentive and the means for the 
nation to fulfil its potential. 
Jagadeesh Bayry Institut 
National de la Santé et de la 
Recherche Médicale, Paris, 
France. 
jagadeesh. bayry@crc.jussieu.fr 
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F. Sherwood Rowland 


(1927-2012) 


Atmospheric chemist who linked human activity to ozone depletion. 


hlorofluorocarbons (CFCs) were a 
( triumph of the chemical industry 

and a mere curiosity in atmospheric 
science when Sherwood (Sherry) Rowland, 
with his postdoc Mario Molina, recognized 
in 1973 that these seemingly inert gases 
posed a threat to Earth's ozone layer. Return- 
ing home one evening, Rowland remarked to 
his wife Joan that the research “is going very 
well, but it may mean the end of the world”. 

In their laboratory at the University of 
California, Irvine, Molina and Rowland 
had discovered that CFC-11 (CFCI,) and 
CFC-12 (CF,Cl,), then widely used as 
refrigerants and aerosol propellants, readily 
absorbed ultraviolet light and broke down to 
release reactive chlorine. This work was the 
first step in tracing the causal chain linking 
industrial production of CFCs with global 
ozone depletion — and won Rowland and 
Molina the 1995 Nobel Prize in Chemistry, 
shared with Dutch chemist Paul Crutzen. 

Surrounded by his family at his home in 
Corona del Mar, California, Rowland died 
on 10 March, aged 84, from complications of 
Parkinsons disease. He was born in Delaware, 
Ohio; his mother was a Latin teacher and his 
father taught mathematics at Ohio Wesleyan 
University in Delaware, where Rowland 
attended college after graduating from high 
school at 15. When he was old enough, he 
enlisted in the US Navy. As a lanky athlete, he 
readily found a home in sports teams in the 
Navy and later in graduate school at the Uni- 
versity of Chicago, Illinois, where he played 
baseball for the university and for a semi- 
professional team. 

Rowland earned his PhD in nuclear chem- 
istry at Chicago under chemist Willard Libby 
and was taught by four other faculty mem- 
bers; counting Sherry, all six would later 
receive Nobel prizes. He met Joan at Chicago, 
and they moved to take up his early jobs at 
Princeton University in New Jersey and at the 
University of Kansas in Lawrence. In 1964, 
Rowland accepted an offer to start up the 
chemistry department at a new University of 
California campus in the then-unbuilt city 
of Irvine. Later, with atmospheric chemist 
Ralph Cicerone, he also helped to found the 
Earth system science department. 

The elegance of Molina and Rowland’s 
1974 Nature paper remains impressive to 
today’s atmospheric chemists, who live ina 
world of satellite observations and supercom- 
puters. Stratospheric chemistry at the time 
was based on balloon-borne samplers of 
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trace gases and on one-dimensional models 
that could now easily run ona smart phone. 

Nonetheless, the pair measured the ultra- 
violet cross sections of CFC-11 and -12 in 
the lab, calculated their photolytic destruc- 
tion rates in the atmosphere and derived 
their atmospheric lifetimes as 50-100 years. 
They reviewed industrial production and 
emission of CFCs, projected the build-up 
and release of chlorine atoms in the strato- 
sphere and concluded that ozone depletion 


was likely and would be long-lived, even with 

remediation. This work has been borne out, 

in detail, by nearly four decades of research. 
Rowland and Molina’s work started an 


environmental movement that began with 
scientists, led by Rowland, urging the elimi- 
nation of CFCs. It remains the best success 
story for global cooperation on a worldwide 
environmental threat. The activism led 
to the 1978 ban by the US Environmental 
Protection Agency on CFC use in aerosol 
cans, and finally in 1990 to the complete 
phasing out of CFC production by the 
Montreal Protocol and its amendments. 

In his unwillingness to back down from the 
implications of his work, Rowland became a 
role model to many of us, and remains so. 
This was a threat to some — particularly the 
CFC industry, but also, less understandably, 
to some scientific colleagues. For many years, 
Rowland experienced personal threats as well 
as irrational attacks on the science. 

Rowland’s science always stood tall, as 
did he, and seemed inerrant. He kept up his 
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interests in ozone and environmental policy, 
but his research moved on. Soon after the 
ozone hole was discovered over Antarctica, 
he made major contributions with his gradu- 
ate student Neil Harris to the detection of 
ozone depletion over the Northern Hemi- 
sphere. This work was crucial in persuading 
DuPont and other chemical companies to 
abandon CFCs in favour of hydrochloro- 
fluorocarbons, which are less damaging to 
the ozone layer. 

In the late 1970s, Rowland initiated a 
programme to monitor background concen- 
trations of various gases, and that continues 
today. Six of its group members were work- 
ing in the field when he passed away. His 
curiosity demanded an objective approach, 
and so it was when, working with his former 
student and then fellow professor (D.R.B.), 
he identified in 1995 that the unusual mix 
of high ozone and hydrocarbons in Mexico 
City was due to leaking propane stoves and 
heaters, rather than traffic. In 2011, he was 
involved in discussions regarding the mix of 
atmospheric hydrocarbons resulting from 
the Deepwater Horizon oil spill in the Gulf 
of Mexico. 

Over almost five decades, Rowland was 
active in his research lab as well as teaching, 
playing tennis and having collegial discus- 
sions over lunch. When not travelling, he 
could be seen carrying his briefcase in one 
hand, with a pile of papers under the other 
arm, to and from his office. He was a prolific 
note-taker, filling a notebook in a week. This 
practice intimidated one of us (M.J.P.), who, 
while giving a talk at an international con- 
ference, first encountered Sherry in the front 
row, taking assiduous notes and then asking a 
terrifying, brutal, yet constructive question. 

Rowland treated everyone like a colleague. 
He disarmingly considered questions from 
any listener with the depth and profundity 
due a scientific peer. This trait was appre- 
ciated by students, friends and family. To 
Sherry, the question was of foremost impor- 
tance; it was at the core of his scientific quest. 
His passing ended a unique career that 
merged chemistry and atmospheric sciences, 
leading to a new partnership between science 
and policy for the protection of our planet. = 


Michael J. Prather is Fred Kavli Chair in 
Earth system science and Donald R. Blake 
is professor of chemistry at the University of 
California, Irvine, California, USA. 
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NEWS & VIEWS 


CLIMATE SCIENCE 


Aerosols and Atlantic aberrations 


Acutting-edge global climate model links atmospheric aerosol emissions to temperature variability in the North Atlantic 
Ocean, suggesting that human activity influences extreme weather events. SEE LETTER P.228 


AMATO EVAN 


ver the past century, the 

surface of the North Atlan- 

tic Ocean has gone through 
warm and cool periods that are not 
observed in other ocean basins. 
This Atlantic Multidecadal Oscil- 
lation (AMO)! is thought to affect 
climate processes’ ranging from the 
current high levels of Atlantic hur- 
ricane activity to the devastating 
sub-Saharan droughts of the early 
1980s. Although the influence of the 
AMO on extreme weather events has 
long been recognized, the physical 
processes underlying these tempera- 
ture changes are not understood. On 
page 228 of this issue, Booth et al.’ 
report their use of a state-of-the-art 
model of Earth's climate to demon- 
strate that, at least over the past cen- 
tury, the AMO is largely the response 
of the upper ocean to changes in the concentra- 
tion of pollution aerosols in the atmosphere. If 
correct, their results imply that the influence of 
human activity on the Atlantic regional climate 
is more pervasive than previously thought*. 

The AMO is best depicted as the differ- 
ence between average ocean surface tem- 
peratures over the North Atlantic and those 
over the global oceans’ (Fig. 1). It therefore 
reflects the deviation of the North Atlan- 
tic Ocean from global mean temperatures, 
which are dominated by the long-term 
warming that is forced by greenhouse gases. 
Conventional wisdom has held that the 
AMO is the natural result of internal pro- 
cesses in the Atlantic Ocean — most notably, 
fluctuations in deep-ocean circulation, as 
supported by multi-century climate-model 
studies’. 

Booth et al.’ simulated the climate of the 
past 150 years using a version of a well-known 
climate model* that includes up-to-date 
parameterizations of aerosol emissions, aero- 
sol chemistry and interactions of aerosols and 
clouds. Nearly all of the observed decadal vari- 
ability in North Atlantic surface temperatures 
was reproduced in their simulation, including 


Temperature difference (°C) 


“This article and the paper under discussion? were 
published online on 4 April 2012. 
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effect. It is these aerosol-cloud 
interactions that have the most influ- 


ence over the AMO in Booth and 
colleagues’ model. 

The idea that variations in aerosol 
concentration have caused decadal- 
scale changes in surface tempera- 
tures in the North Atlantic is not 
new. A body of work has emerged 
suggesting that elements of the 
AMO are externally forced by aero- 
sols through direct and first indirect 
effects, and has implicated aerosols 


1875 


1 1 1 1 
1900 1925 1950 1975 


Year 


Figure 1 | The Atlantic Multidecadal Oscillation. The difference 
between average ocean surface temperatures over the North Atlantic and 
those over the global oceans has oscillated between cool and warm phases 
over the past hundred years or so, as depicted here. Booth et al.’ report 
that simulations of global climate link this temperature variability to 
atmospheric aerosol emissions. 


the AMO and the long-term warming associ- 
ated with increasing amounts of greenhouse 
gases. This is the first time that changes in sea 
surface temperatures have been reproduced 
to this degree of accuracy by a climate model. 
The authors’ analysis of the model’s output 
reveals that this variability about the upward 
temperature trend results from cooling asso- 
ciated with periodic volcanic eruptions, and 
from the build-up of polluting aerosols in the 
atmosphere that occurred from pre-industrial 
times until the late 1960s and early 1970s, 
when clean-air legislation in the United States 
and Europe was implemented. 

So how do aerosols affect sea surface 
temperatures? When suspended over water, 
aerosols tend to cool the surface by increasing 
the local albedo (the ability to reflect sunlight), 
a phenomenon known as the aerosol direct 
effect. Aerosols caused by human activity may 
also act as nuclei around which water vapour 
in the atmosphere can condense. When more 
of these nuclei are available for condensation 
within a cloud, the number of water droplets 
in the cloud goes up and the average size of 
the water droplets goes down, making the 
cloud brighter so that it reflects more sunlight 
back out to space. This process is known as the 
cloud albedo effect, or the first aerosol indirect 
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from volcanic eruptions’, West 
African dust storms® and human 
activity”. However, before Booth 
and colleagues’ work, no study had 
incorporated the direct and indi- 
rect forcings from these various 
aerosol types to paint a coherent 
picture of temperature changes in 
the Atlantic Ocean that was con- 
sistent with both the observed 
temporal variability and the dominant 
spatial structure of the changes. 

Booth and colleagues’ evidence’ that the 
AMO is caused by changes in the regional 
abundance of aerosols is compelling, but 
their results are sensitive to model para- 
meterizations of microphysical processes, 
particularly the interaction between cloud 
water droplets and aerosols, that are not well 
constrained by observations. In addition, their 
model was unable to reproduce observed 
multidecadal variability in outbreaks of Afri- 
can dust storms”, which alter the temperature 
of the tropical Atlantic®; this may explain why 
the model does a poorer job of simulating 
temperatures in the tropical North Atlan- 
tic Ocean than it does in the extratropical 
regions. Furthermore, the authors’ conclusion 
that internal variability of the Atlantic Ocean 
had a negligible role in shaping the AMO dur- 
ing the twentieth century is at odds with the 
findings of several previous studies”''. The 
reason for this discrepancy is not clear. 

If Booth and colleagues’ results’ can be cor- 
roborated, then they suggest that multidecadal 
temperature fluctuations of the North Atlantic 
are dominated by human activity, with natu- 
ral variability taking a secondary role. This has 
many implications. Foremost among them is 


that the AMO does not exist, in the sense that 
the temperature variations concerned are 
neither intrinsically oscillatory nor purely 
multidecadal. 

Another implication concerns hurri- 
canes. As noted earlier, quiescent and active 
periods of Atlantic hurricane activity have 
been linked’ to the AMO. These swings in 
hurricane frequency and intensity might 
therefore be the regional response to varia- 
tions in the concentration of pollutant aero- 
sols against a background of global warming, 
and thus completely man-made. Similarly, 
human activity might have caused periods 
of drought within the Sudano-Sahel region 
of Africa and in northeastern Brazil. 
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As we try to predict the climate in a warm- 
ing world, an increasing body of work sug- 
gests that aerosols may have regional effects as 
great as those caused by the global increase in 
atmospheric carbon dioxide. Booth and col- 
leagues’ work’ underscores the importance of 
understanding the diverse pathways by which 
humans alter the climate. m 
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Running to stand still 


Transcription factors regulate the expression of genes by binding to certain DNA 
sequences. But the outcome can be markedly different, depending on whether 
the binding is stable or short-lived. SEE LETTER P.251 


TOMMY KAPLAN & NIR FRIEDMAN 


o reproduce, differentiate or even just 

respond to changes in their surround- 

ings, cells need to control the expression 
of thousands of genes. One way of doing this 
is to use transcription factors — proteins that 
bind to regulatory regions on their target genes 
and either activate or repress the transcrip- 
tion of DNA into RNA. Over the past decade, 
researchers have analysed the binding sites of 
hundreds of these proteins on the genomes 
of many organisms and cell types, and meas- 
ured the gene-expression patterns within the 
same cells. In such studies, the overall degree 
of occupancy by a transcription factor at a 
regulatory region is commonly interpreted as 
an indication of the protein’s ability to control 
the expression of the gene. However, transcrip- 
tion factors also bind to thousands of genes in a 
weak, and probably non-functional, manner’. 
On page 251 of this issue, Lickwar et al’ illu- 
minate this matter by reporting the results ofa 
systematic, genome-wide study of the binding 
dynamics of a particular transcription factor. 
The authors find that transcription levels have 
a stronger link to the kinetics of binding than 
to the total occupancy of the factor. 

The DNA-binding sites of the transcription 
factor Rap1 along the genome of the yeast 
Saccharomyces cerevisiae were mapped more 
than a decade ago’. The mapping used a 
genome-wide protein-DNA binding assay, 
known as chromatin immunoprecipitation 
(ChIP)-on-chip, or ChIP-seq, which identifies 
the genomic locations ofa transcription factor 
over a huge number of live cells and therefore 


averages the transcription-factor occupancy 
over a large population. This is still the method 
of choice in similar genome-wide studies. 
However, a high occupancy of a transcription 
factor at a specific site — as detected by this 
technique — can mean either that the factor 
is constantly bound to this DNA location in 
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some of the cells, or that it is transiently bound 
in many cells. 

To distinguish between the two possibilities, 
Lickwar et al.’ adapted a strategy, previously 
used to measure the turnover of DNA-bound 
proteins*°, to address the question of tran- 
scription-factor binding stability. The authors 
created yeast cells that produced two Rap1 
variants, Rap1-Flag and Rap1-Myc, each 
one with a ‘tag’ consisting of a specific pep- 
tide that could be recognized by antibodies. 
Furthermore, Rap1—Flag was produced con- 
stantly, whereas Rap1—Myc’s expression was 
experimentally inducible. The authors then 
measured the binding of each Rap] variant to 
the yeast genome in a dense time series after 
Rap1-Myc induction. Although the inducible 
protein quickly outcompeted Rap1-Flag at 
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Figure 1 | Well-balanced gene expression. Transcription factors can activate the expression of genes 

by binding to certain regulatory regions on the genome. Lickwar et al.’ studied, at the genomic scale, the 
binding dynamics of one of these proteins, and propose the following model for transcription-factor 
function. a, Most of the genome is wrapped around histone proteins to form nucleosomes, the basic units 
of DNA packaging into chromatin. b, Nucleosomes and transcription factors compete for binding to 
some regulatory regions. Transcription-factor binding to these regions occurs in short pulses, which are 
not sufficient for efficient transcription of the gene into RNA. c, When the transcription factor binds to its 
target site for longer periods, it recruits the core transcriptional machinery required to start transcription, 


leading to high transcription rates. 
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that the AMO does not exist, in the sense that 
the temperature variations concerned are 
neither intrinsically oscillatory nor purely 
multidecadal. 

Another implication concerns hurri- 
canes. As noted earlier, quiescent and active 
periods of Atlantic hurricane activity have 
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hurricane frequency and intensity might 
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tions in the concentration of pollutant aero- 
sols against a background of global warming, 
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humans alter the climate. m 


Amato Evan is in the Department of 
Environmental Sciences, University of 
Virginia, Charlottesville, Virginia 22904, USA. 
e-mail: ate9c@virginia.edu 


1. Kerr, R. A. Science 288, 1984-1985 (2000). 
2. Knight, J. R., Folland, C. K. & Scaife, A. Geophys. 


Running to stand still 


Transcription factors regulate the expression of genes by binding to certain DNA 
sequences. But the outcome can be markedly different, depending on whether 
the binding is stable or short-lived. SEE LETTER P.251 


TOMMY KAPLAN & NIR FRIEDMAN 


o reproduce, differentiate or even just 

respond to changes in their surround- 

ings, cells need to control the expression 
of thousands of genes. One way of doing this 
is to use transcription factors — proteins that 
bind to regulatory regions on their target genes 
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of occupancy by a transcription factor at a 
regulatory region is commonly interpreted as 
an indication of the protein’s ability to control 
the expression of the gene. However, transcrip- 
tion factors also bind to thousands of genes in a 
weak, and probably non-functional, manner’. 
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minate this matter by reporting the results of a 
systematic, genome-wide study of the binding 
dynamics of a particular transcription factor. 
The authors find that transcription levels have 
a stronger link to the kinetics of binding than 
to the total occupancy of the factor. 

The DNA-binding sites of the transcription 
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than a decade ago’. The mapping used a 
genome-wide protein-DNA binding assay, 
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the genomic locations ofa transcription factor 
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created yeast cells that produced two Rap1 
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tide that could be recognized by antibodies. 
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some genomic sites, it was slowly incorporated 
into other sites, which suggested that Rap] 
binding to the first sites was less stable than to 
the other sites. 

The researchers then applied a mathematical 
model’ to estimate the rate of Rap] turnover at 
more than 400 target genes. They found that, 
among genes with high Rap] occupancy, those 
with slower Rap] turnover showed higher 
transcription levels than those with faster 
Rap1 turnover. That is, the transcription level 
depended on how long Rap1 remained bound. 
Of note, the genomic sites that exhibited fast 
Rap] turnover in this analysis” have previously 
been reported”” to have fast turnover rates 
of nucleosomes (protein complexes around 
which DNA is packaged) and of the general 
transcription factor TBP (which facilitates the 
binding of the core transcriptional machin- 
ery). Overall, the results are consistent with 
those of other studies”* that showed that tran- 
scription factors and nucleosomes compete for 
some genomic sites and that this competition 
leads to inefficient transcription. 

Lickwar et al. suggest a model for the binding 
dynamics of transcription factors that activate 
transcription (Fig. 1). In this model, on bind- 
ing to a target site, the factor has to recruit the 
core transcriptional machinery. This process 
takes some time. Therefore, if the factor’s 
binding to the DNA site is unstable, it will 
not lead to productive transcription. Indeed, 
it has been shown’ that short, repeated pulses 
of Msn2 — another transcription factor — 
into the cell nucleus do not activate target 
genes, whereas longer pulses do. Therefore, 
for transcription factors to be effective acti- 
vators, they require stable binding to their 
target DNA. 

Moreover, the researchers speculate that a 
constant turnover or ‘treadmilling’ of nucle- 
osomes and transcription factors acts as a 
distinct mechanism for transcriptional regu- 
lation. Unlike static gene repression”, in which 
transcription is prevented by the nucleosome’s 
protection of DNA, a site that has a treadmill- 
ing transcription factor is poised for activation. 
When, somehow, the nucleosome is removed 
or its affinity for DNA is decreased, the factor 
can quickly achieve stable binding to its target 
sequence and so activate the gene’s transcrip- 
tion. Several mechanisms would allow for the 
targeted eviction of nucleosomes, including 
chromatin-remodelling enzymes (which move 
nucleosomes on DNA), chemical modifica- 
tions of histones (the protein components of 
nucleosomes) or the replacement of certain 
histones with specific variants. 

Lickwar and colleagues’ study” explains how 
different regulatory regions can present simi- 
lar levels of transcription-factor occupancy 
and different transcriptional levels. But it also 
raises further exciting questions. Do the differ- 
ent turnover rates of transcription factors play 
akey part in gene regulation? Or do they just 
reflect some other aspects of the transcriptional 
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process, such as stabilization of protein-DNA 
binding by interactions with the transcrip- 
tional machinery? Are nucleosomes the 
only competition for factor binding to DNA, 
or is competition with other transcription 
factors important too? 

To fully understand how transcription fac- 
tors work, we should consider not only their 
overall binding occupancy, but also their 
binding dynamics. This line of research will 
form the basis for a much-needed quantita- 
tive understanding of transcription regulation 
kinetics. m 
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Fresh light on stardust 


Ageing stars produce elements vital for life and disperse them into space on 
stellar winds. The discovery of large dust grains in the vicinity of cool giant stars 
sheds light on the mechanisms that drive such winds. SEE LETTER P.220 


SUSANNE HOFNER 


hemical elements that are crucial for 

building Earth-like planets and living 

organisms have their origin in ageing 
stars and stellar deaths. The nuclear processes 
that create these elements are well understood, 
but the mechanisms that transport them to the 
stellar surfaces and out into interstellar space 
are still a matter of intensive research. On page 
220 of this issue, Norris et al.' report the detec- 
tion of silicate particles about 600 nanometres 
in diameter in the immediate vicinity of several 
cool giant stars. This result confirms the pre- 
dictions of models’ that explain how gas can 
escape stellar gravity and become part of the 
cosmic-matter cycle. 

Stellar explosions known as supernovae 
have considerable input into the produc- 
tion and dispersion of heavy elements (those 
heavier than helium), but they are not the sole 
contributors. Stars, including the Sun, release 
continuous outflows of gas, called stellar 
winds, for most of their lives. As stars evolve 
into cool giants and supergiants, stellar winds 
become increasingly effective in transporting 
matter out of stellar-gravity wells, enriching 
the surrounding interstellar medium with 
newly produced chemical elements. Winds 
of ageing low- and intermediate-mass stars, 
such as those observed by Norris et al. ' lead 
to a runaway mass-loss process that eventually 
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stops the stars’ nuclear processes and turns 
them into white dwarfs. 

The mechanisms that drive gas away from 
stars differ, depending on a star’s surface tem- 
perature, mass, luminosity, chemical com- 
position and magnetic field. Winds of cool, 
luminous giants are presumably triggered by 
radiative acceleration of dust grains that form 
in the extended stellar atmospheres. Momen- 
tum is transferred from the photons emitted 
by the star to the dust grains through photon 
absorption and scattering, and is subsequently 
redistributed to the more numerous gas parti- 
cles by collisions with the dust grains (Fig. 1). 
Because the star’s photons are predominantly 
directed away from the stellar surface, the flow 
of gas and dust also follows this pattern. 

Although the physical principles of dust- 
driven winds are reasonably well understood, 
there is currently no consensus on which types 
of grain are driving the outflows. However, 
some basic features are known. First, the mass 
of the gas that is pushed outwards by the dust 
is more than 100 times higher than the collec- 
tive mass of the dust particles. This requires 
grains made from abundant materials that 
have large radiative cross sections to drive the 
winds. Second, the grains have to form close to 
the star to trigger the outflows. This distance 
is limited by how far shock waves, caused by 
pulsation and convection in the stellar inte- 
riors, can lift gas above the stellar surface, at 
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Figure 1 | Dust-driven stellar winds. Dust grains forming in the atmosphere of a cool luminous star 

are accelerated away (white arrow) from the star through absorption and emission or scattering of stellar 
photons. By subsequently colliding with molecules in the surrounding gas, the grains accelerate the 
molecules, make them collide with other gas molecules and trigger an outflow of gas, or stellar wind. 
Norris and colleagues’ study’ of the immediate vicinity of several cool giant stars provides information on 
the sizes and material properties of the grains that drive stellar winds. 


which temperatures are typically 3,000 kelvin, 
too high for solid particles to form. The strong 
flux of stellar photons — which not only accel- 
erates but also heats the grains — severely 
limits the types of dust material that can form 
and survive at distances at which winds are 
triggered. 

On the basis of element abundances and 
distinctive spectral features, magnesium iron 
silicates of the olivine and pyroxene type have 
long been considered to be promising candi- 
dates for wind drivers’. But detailed models 
demonstrate’ that silicate grains have to be 
almost devoid of iron to exist at the distance 
from the stars at which the winds are accel- 
erated, in contrast to earlier assumptions’. 
Because such particles are highly transpar- 
ent at near-infrared wavelengths, at which 
most of the stellar flux is emitted, photon 
absorption will be insufficient to trigger 
an outflow. If, however, such iron-free sili- 
cate grains grow to 100-1,000 nm in radius, 
photon scattering dominates over absorption, 
providing sufficient radiative acceleration for 
driving winds’. 

It is therefore of note that silicate grains 
collected in the Solar System by NASA’s 
Stardust mission’ tended to be iron-poor. 
However, whether this reflects the original 
composition of grains produced in previ- 
ous generations of stars before the Solar 
System formed, or whether stardust is heavily 
modified in interstellar space and during 
planetary-system formation, is unclear. To test 
theoretical predictions, the size and composi- 
tion of dust grains have to be measured in the 
immediate vicinities of the stars in which they 
are formed. 

By analysing starlight scattered on dust 
grains close to several cool giants, Norris 


et al.' found silicate particles with diameters of 
600 nm at distances of about two stellar radii, in 
accordance with model predictions”. This result 
was made possible by a clever combination of 
advanced instrumentation and observational 
methods. The authors used measurements 
of polarization (the direction in which light’s 
electric field oscillates) to identify photons 
scattered by dust and to determine the size 
of the grains, and employed interferometric 
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techniques to obtain high-spatial-resolution 
images of the close stellar environments in 
which the winds are triggered. 

Identifying wind-driving dust grains in cool 
giants is a step towards a comprehensive pic- 
ture of stellar-mass loss, which is essential for 
understanding ageing stars and their role in 
the cosmic-matter cycle. Stellar winds strongly 
affect the evolution of low- and intermediate- 
mass stars. These stars are probably the pro- 
genitors of type Ia supernovae, which are a 
cornerstone of cosmological studies’. Reliable 
mass-loss rates and dust yields are essential 
ingredients for understanding the chemical 
evolution of the Milky Way and other galaxies, 
and for estimating the effects of interstellar 
dust on the light of distant galaxies, which 
is used in studies of the early Universe. The 
advanced technique used by Norris et al.’ for 
studying the properties of stardust at its place 
of origin provides crucial input to these fields 
of research. = 
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Adaptation by target 
remodelling 


Bacteria direct their movement in response to certain chemicals by controlling 
the rotation of whip-like appendages called flagella. The sensitivity of the 
response can be adjusted at the signal’s target, the flagellar motor. SEE LETTER P.233 


GERALD L. HAZELBAUER 


hen we switch on a lamp ina dark 
room, our eyes adapt quickly to 
the lighting. Similarly, in a process 


known as chemotaxis, specific sensory systems 
in bacteria detect and adapt to changes in the 
concentration of molecules such as nutrients, 
so that the microbes can control the motion 
of their flagella and swim to more favourable 
environments. Many adaptive mechanisms act 
on cell-surface receptor proteins that detect 
stimuli and generate signals across the cell 
membrane and within the cytoplasm. Indeed, 
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the sensitivity of the bacterial chemotaxis 
machinery is regulated by addition or 
removal of methyl groups on the chemical- 
sensing receptors (chemoreceptors)’. But, as 
Yuan et al.’ report on page 233 of this issue, 
the machinery can also adapt by remodel- 
ling the target of signalling — the flagellar 
motor. Motor adaptation differs from recep- 
tor adaptation in its time frame, mechanism 
and function. 

In the bacterial chemotaxis system, different 
chemoreceptors can bind to either attractant 
or repellent ligand molecules, and by doing so 
they alter the activity of a receptor-associated 
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100 Years Ago 


The loss of 
the “Titanic” 


The terrible loss of life on account of the 
disaster to the Titanic has directed emphatic 
attention to various aspects of the employ- 
ment of wireless telegraphy in times of cri- 
sis at sea. The point which is at the moment 
attracting most of the public attention is that 
of the erroneous messages, or alleged mes- 
sages, which appeared in the newspapers 
in the day or two following the disaster ... 
All this raises more prominently than ever 
the chaotic condition of wireless telegraphy 
in the United States ... [T]he most urgent 


call for help will pass unheeded if none of 


the operators on the ships within hail are on 
duty. In fact, it seems to have been a mere 
chance that the Carpathia operator was at 
his apparatus at the time the Titanic called. 
On ships that carry only one operator — 
and very few carry more — the man cannot 
always be on the look-out... 

Engineering aspects of the disaster are dis- 


cussed in the leading article in Engineering 


for April 19 ... [SJeveral questions present 
themselves as ripe for discussion and settle- 
ment. The effect of centre-line or longitudi- 
nal wing bulkheads is one of these. Such have 
advantages in confining any water admitted 
toa part of the width, but have disadvantages 
even from the point of view of stability under 
disastrous conditions. The effect of impact 
on the superstructure of very large ships will 
have to be considered. In such ships it has 
become a practice to have two or three decks 
above the moulded structure. Would iner- 
tia have effects somewhat similar to those 
experienced in railway collisions, in which 
the body of the carriage is driven from the 


kinase enzyme’ (Fig. 1). The kinase adds a 
phosphate group to the protein CheY, which 
can bind only in its phosphorylated form 
(CheY-P) to FIiM, a protein component of the 
flagellar rotary motor. Binding leads to a switch 
in flagellar rotation from anticlockwise — the 
default state — to clockwise. As anticlockwise 
rotation powers forward swimming (‘runs’) 
and clockwise rotation causes abrupt direc- 
tional changes (‘tumbles’), their alternation 
generates a three-dimensional random walk. 
When the bacterium swims towards increasing 
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under-frame? ... The engineers of the ship 
have all been lost — their claim to recogni- 
tion is the simplest and best; they did their 
duty to the end. 

From Nature 25 April 1912 


At the time when the Titanic was lost the 
standing Advisory Committee appointed 
by the Board of Trade under the provisions 
of Merchant Shipping Acts was engaged in 
the reconsideration of the regulations for 
boats and life-saving appliances ... The main 
recommendations of [their] report may be 
summarised. First, it is recognised that “the 
stability and seaworthy qualities of the ves- 
sel itself” must be regarded as of primary 
importance. This includes the question of 
watertight subdivision, now under inves- 
tigation by a special committee. Second, 
as regards boats and life-saving appliances 
it is recommended that accommodation 
should be provided for the total number of 
persons which each foreign-going passen- 
ger steamship is licensed to carry ... One 
of [the committee’s] most valuable recom- 
mendations is that proposing to extend the 
present regulations and to prescribe to 
those in charge of ships the necessity for 


concentrations of attractant, the interaction 
of attractant with its receptors reduces kinase 
activity, and thus CheY-P levels and the prob- 
ability of tumbles, biasing the random walkin 
favourable directions. 

But when ligand concentration remains 
constant, the system adapts by reverting to its 
null state, so that it can then respond to any 
additional concentration changes. Adapta- 
tion occurs through the addition or removal 
of methyl groups to the receptors in response 
to occupancy by attractants or repellents. 
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Titanic sank on 15 April 1912 on its maiden voyage, after hitting an iceberg off Newfoundland. 


proceeding at moderate speed “at night in 
the known vicinity of ice” 
From Nature 29 August 1912 


It behoves surely men of science to ask the 
question whether we have not reached 
the imperative limits of that false security 
which the “practical man” is wont to feel 
in his contempt for scientific “theory”; and 
further, whether the time has not therefore 
come for legislation requiring commanders 
of the largest ocean-going steamers to hold 
a diploma, guaranteeing such a systematic 
course of study (say in a class at Green- 
wich or Kensington) in marine physiogra- 
phy and the elementary laws of mechanics 
as would quicken their imagination as to 
the uncertainty and the magnitude of the 
risks to be run in an abnormally ice-drifted 
sea. Lord Mersey’s report may whitewash 
the facts, but the facts en évidence remain; 
and the chain of cause and effect in the 
lamentable and tragic loss of the Titanic 
leads us in the last resort to the notorious 
contempt for scientific acquaintance with 
the facts and laws of nature on the part of the 
“practical man”. 

From Nature 12 September 1912 


Methylation and demethylation shift receptor 
conformation and thus kinase activity, Che Y-P 
levels, rotational bias and swimming behav- 
iour in the opposite direction to attractant and 
repellent occupancy, respectively. These recep- 
tor adaptive changes occur more slowly than 
those generated by ligand binding, and this 
difference provides the bacterium with a 
means of comparing current and recent 
ligand concentrations through the oppos- 
ing influences of receptor occupancy and the 
extent of methylation. The overall result is 
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that the microbe can sense temporal changes 
in the concentration of relevant molecules and 
therefore can swim in the ‘right’ direction. 

The flagellar motor is extremely sensitive to 
small variations in CheY-P concentration’, and 
this poses a challenge to the system because 
the steady-state CheY-P concentration varies 
among individual cells’. Given this variation, 
some cells would be expected to run only or 
to tumble only, and therefore to be unable to 
perform proper chemotaxis. As this does not 
seem to occur, a mechanism must exist to keep 
a correspondence between steady-state CheY-P 
levels and the narrow CheY-P concentration 
range over which the rotational switch is sensi- 
tive. However, previous rese arch’ did not find 
one possible mechanism — feedback from the 
flagellar rotary motor to the kinase. 

Yuan et al.’ investigated an alternative 
possibility: could the motor itself adapt to 
changes in CheY-P concentration? The authors 
used a strain of Escherichia coli that lacked 
the enzymes that carry out receptor methyl- 
ation and demethylation, as otherwise the 
process under study would have been masked 
by the rapid and complete adaptation medi- 
ated by these enzymes. When the bacteria 
were stimulated by an attractant, the authors 
observed that the CheY-P concentration and 
the motor’s clockwise bias both decreased 
rapidly, as expected. However, over the next 
few minutes, even though CheY-P levels 
remained unchanged, the clockwise bias grad- 
ually increased to an intermediate steady-state 
value that was lower than in the absence of 
attractant. A similar slow, partial adaptation of 
the flagellar motor was reported®’ 25 years ago 
in bacteria lacking the receptor methylation 
and demethylation enzymes. 

The researchers” also measured the clock- 
wise bias of flagellar motors in individual bac- 
terial cells before stimulation, at maximum 
response and after partial adaptation. In the 
presence of persistently decreased CheY-P 
concentration, they documented a shift in 
the motor’s sensitivity to CheY-P to a lower, 
but still narrow, concentration range. When 
they analysed these data using a mathematical 
model for cooperative motor switching, the 
results indicated that the adaptation probably 
reflected an increased number of binding sites 
for CheY-P in the flagellar motor — in other 
words, more copies of FliM. 

To test this hypothesis, the authors 
expressed a modified version of FliM (fused to 
a yellow fluorescent protein) in the E. coli cells, 
and then observed these cells using total inter- 
nal reflection fluorescence (TIRF) microscopy. 
The researchers did indeed detect increases of 
up to about 25% in the number of FliM copies 
(from 34 to 42 copies) in flagellar motors upon 
partial adaptation. This result is consistent 
with a previous finding® of CheY-P-dependent 
turnover of FliM units, and suggests a func- 
tion for such a process: the addition of FliM 
copies to the flagellar motor would increase the 
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Figure 1 | Microbial sense and sensibility. Protein receptors on a bacterial cell’s surface bind to specific 
attractant or repellent molecules, which act as inputs for the bacterium’s sensory system. In the absence 
of attractant, a receptor-associated kinase enzyme donates a phosphoryl group (P) to the protein CheY. 
Phosphorylated CheY (CheY-P) is an intracellular signal that interacts with the protein FliM at the 
flagellar motor. This interaction generates a change in rotational direction of the motor — the system’s 
output. Upon attractant binding, the receptor undergoes a conformational change that reduces kinase 
activity and consequently the levels of CheY-P and the probability of a change in the direction of rotation. 
Binding of repellent has the opposite effects. If attractant or repellent concentrations do not change for a 
while, the sensory system ‘resets’ rapidly and completely, within seconds, to a null state through receptor 
methylation (for attractant) or demethylation (for repellent) at the input end. Yuan et al.’ find that the 
system also slowly and partially adapts at the output end to different, persistent levels of CheY-P. It does so 
by changing the number of FliM copies in the flagellar motor, thus shifting the narrow sensitivity range of 
the motor to correspond to different steady-state levels of CheY-P. 


probability of CheY-P binding to the motor 
and therefore the sensitivity of the rotational 
switch. 

Yuan and colleagues’ results” generate 
questions for future research. For instance, 
how does the flagellar motor sense changes 
in the level of CheY-P? And how does sensing 
lead to an increased number of FliM copies? 
How few and how many copies can be accom- 
modated, and what structural interactions 
between FIliM units allow such variability? 
Answering these questions will be challenging, 
but insight could be gained by identifying and 
characterizing mutant bacteria that are defec- 
tive in motor adaptation, and by extending 
high-resolution fluorescence analyses such as 
the TIRF microscopy experiments done by the 
authors. Full answers are likely to require both 
biochemical and structural studies. 

It also needs emphasizing that motor 
adaptation is a slow process, occurring over 
minutes. So, in the absence of methylation- 
dependent adaptation of the receptors, which 
occurs within seconds, bacteria cannot per- 
form chemotaxis’"®. Nonetheless, motor 
adaptation is likely to be crucial for accom- 
modating stochastic variations in the dosages 
of protein components of the chemotaxis 
sensory machinery, such as CheY. The adjust- 
ment of the sensitivity of a signalling system 
by adaptive remodelling of its output end is a 
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tantalizing observation, because experience 
tells us that if a new strategy is discovered in 
one biological system, then there are undiscov- 
ered examples in other systems. Therefore, as 
the authors suggest, other biological molecular 
machines may adapt to changes in signal levels 
by resetting the sensitivity of their response. = 
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100 Years Ago 


The loss of 
the “Titanic” 


The terrible loss of life on account of the 
disaster to the Titanic has directed emphatic 
attention to various aspects of the employ- 
ment of wireless telegraphy in times of cri- 
sis at sea. The point which is at the moment 
attracting most of the public attention is that 
of the erroneous messages, or alleged mes- 
sages, which appeared in the newspapers 
in the day or two following the disaster ... 
All this raises more prominently than ever 
the chaotic condition of wireless telegraphy 
in the United States ... [T]he most urgent 
call for help will pass unheeded if none of 
the operators on the ships within hail areon 
duty. In fact, it seems to have been a mere 
chance that the Carpathia operator was at 
his apparatus at the time the Titanic called. 
On ships that carry only one operator — 
and very few carry more— theman cannot 
always beon thelook-out... 

Engineering aspects of the disaster are dis- 
cussed in theleading articlein Engineering 
for April 19... [S]everal questions present 
themselves as ripefor discussion and settle- 
ment. The effect of centre-line or longitudi- 


nal wing bulkheads is one of these. Such have 


advantages in confining any water admitted 
to apart of the width, but havedisadvantages 
even from the point of view of stability under 
disastrous conditions. The effect of impact 
on thesuperstructureof very large ships will 
have to be considered. In such ships it has 
becomea practiceto have two or threedecks 
above the moulded structure. Would iner- 
tia have effects somewhat similar to those 
experienced in railway collisions, in which 
the body of the carriageis driven from the 


kinase enzyme’ (Fig. 1). The kinase adds a 
phosphate group to the protein Chey, which 
can bind only in its phosphorylated form 
(CheY-P) to FliM , aprotein component of the 
flagellar rotary motor. Binding leads to aswitch 
in flagellar rotation from anticlockwise — the 
default state— to clockwise. As anticlockwise 
rotation powers forward swimming (‘runs’) 
and clockwise rotation causes abrupt direc- 
tional changes (‘tumbles’), their alternation 
generates a three-dimensional random walk. 
When thebacterium swims towardsincreasing 
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Titanic sank on 15 April 1912 on its maiden v 
under-frame? ... The engineers of the ship 
haveall been lost — their claim to recogni- 
tion isthesimplest and best; they did their 
duty to theend. 

From Nature25 April 1912 


At the time when the Titanic was lost the 
standing Advisory Committee appointed 
by the Board of Trade under the provisions 
of Merchant Shipping Acts was engaged in 
the reconsideration of the regulations for 
boats and life-saving appliances... Themain 
recommendations of [their] report may be 
summarised. First, it is recognised that “the 
stability and seaworthy qualities of the ves- 
sel itself” must be regarded as of primary 
importance. This includes the question of 
watertight subdivision, now under inves- 
tigation by a special committee. Second, 
as regards boats and life-saving appliances 
it is recommended that accommodation 
should be provided for the total number of 
persons which each foreign-going passen- 
ger steamship is licensed to carry ... One 
of [the committee's] most valuable recom- 
mendations is that proposing to extend the 
present regulations and to prescribe to 
those in charge of ships the necessity for 


concentrations of attractant, the interaction 
of attractant with its receptors reduces kinase 
activity, and thus CheY-P levels and the prob- 
ability of tumbles, biasing therandom walk in 
favourable directions. 

But when ligand concentration remains 
constant, the system adapts by reverting to its 
null state, so that it can then respond to any 
additional concentration changes. A dapta- 
tion occurs through the addition or removal 
of methyl groups to the receptors in response 
to occupancy by attractants or repellents. 
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oyage, after hitting an iceberg off Newfoundland. 


proceeding at moderate speed “at night in 
theknown vicinity of ice” 
From Nature29 August 1912 


It behoves surely men of science to ask the 
question whether we have not reached 
the imperative limits of that false security 
which the “practical man” is wont to feel 
in his contempt for scientific “theory”; and 
further, whether the time has not therefore 
come for legislation requiring commanders 
of the largest ocean-going steamers to hold 
a diploma, guaranteeing such a systematic 
course of study (say in a class at Green- 
wich or Kensington) in marine physiogra- 
phy and theelementary laws of mechanics 
as would quicken their imagination as to 
the uncertainty and the magnitude of the 
risksto berun in an abnormally ice-drifted 
sea. Lord M ersey’s report may whitewash 
the facts, but the facts en évidence remain; 
and the chain of cause and effect in the 
lamentable and tragic loss of the Titanic 
leads us in the last resort to the notorious 
contempt for scientific acquaintance with 
the facts and laws of natureon the part of the 
“practical man”. 

From Nature12 September 1912 


Methylation and demethylation shift receptor 
conformation and thus kinase activity, CheY-P 
levels, rotational bias and swimming behav- 
iour in the opposite direction to attractant and 
repellent occupancy, respectively. These recep- 
tor adaptive changes occur more slowly than 
those generated by ligand binding, and this 
difference provides the bacterium with a 
means of comparing current and recent 
ligand concentrations through the oppos- 
ing influences of receptor occupancy and the 
extent of methylation. The overall result is 
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High-valent organometallic copper and 
palladium in catalysis 


Amanda J. Hickman! & Melanie S. Sanford! 


Copper and palladium catalysts are critically important in numerous commercial chemical processes. Improvements in 
the activity, selectivity and scope of these catalysts could drastically reduce the environmental impact, and increase the 
sustainability, of chemical reactions. One rapidly developing strategy for achieving these goals is to use ‘high-valent’ 
organometallic copper and palladium intermediates in catalysis. Here we describe recent advances involving both the 
fundamental chemistry and the applications of these high-valent metal complexes in numerous synthetically useful 


catalytic transformations. 


widely used for the construction of important organic molecules, 

including pharmaceuticals'*, commodity chemicals* and 
polymers’. The development of copper and palladium catalysis has been 
inextricably linked because both metals have been used extensively in the 
construction of similar types of carbon-carbon and carbon-heteroatom 
bonds. Furthermore, advancements in, and insights into, copper chemistry 
have often spurred improvements in palladium-catalysed processes, and 
vice versa, leading to a wide range of robust, synthetically valuable and 
often complementary synthetic methods. This Review focuses particularly 
on an area that has progressed tremendously over the past decade: the use 
of high-valent copper and palladium complexes in catalysis (see Box 1 for 
definitions of ‘high-valent’ copper and palladium). 


H omogeneous copper- and palladium-catalysed reactions are 


History and importance of copper catalysis 

Copper is an inexpensive, earth-abundant, non-toxic metal that has 
found widespread application in homogeneous catalysis. For example, 
copper-catalysed cross-coupling reactions have been extensively studied 
since their discovery at the turn of the twentieth century®’. Such 
reactions serve as versatile methods for synthesizing biaryl linkages as 
well as for constructing the carbon-heteroatom bonds of aryl amines, 
aryl ethers and aryl thioether derivatives. The alkylation of carbon 
electrophiles with organometallic copper compounds (organocuprates) 
is another classic reaction in organic synthesis’. Diverse carbon-carbon 
bond-forming reactions of organocuprates have been developed over 
the past 70 years, and these transformations are featured in most intro- 
ductory undergraduate textbooks on organic chemistry. 

High-valent copper intermediates (that is, organometallic copper(1m) 
(Cu(m1)) species) have long been proposed to have a role in both copper- 
catalysed cross-coupling and organocuprate reactions’’. In particular, 
carbon-carbon and/or carbon-heteroatom bond formation from an 
organo-Cu(iI) species has been invoked as the product-releasing step 
of many of these transformations**'*'?. However, the proposed high- 
valent copper intermediates have eluded detection for decades, and as a 
result there has been considerable controversy over the mechanistic 
details of both organocuprate additions and copper-catalysed cross- 
coupling". Indeed, until very recently, these two transformations were 
among the least mechanistically understood synthetic methods in 
organometallic chemistry. These mechanistic questions and controversy 
have provided tremendous motivation for probing the accessibility and 
reactivity of organo-Cu(u!) species. A deeper mechanistic understanding 


of their chemistry promises to allow the development of improved copper 
catalysts for known reactions as well as to inspire novel copper-catalysed 
transformations. In the section on high-valent copper below, we discuss 
recent progress on many of the vital questions in this area (discussed in 
detail in Box 1). 


History and importance of palladium catalysis 

Although copper-mediated cross-coupling methods were the first of 
their kind, today cross-coupling has become synonymous with a differ- 
ent metal: palladium. Well-defined palladium-catalysed cross-coupling 
reactions were first developed in the 1970s, and they quickly surpassed 
copper-based methods in both popularity and scope. These reactions 
have transformed the way organic chemists approach the construction 
of bonds in complex molecules’, and the wide-ranging impact of this 
methodology was recognized in the awarding of the Nobel Prize in 
Chemistry in 2010. 

The rapid success of palladium-catalysed cross-coupling is due, in 
large part, to extensive and systematic investigations of reaction 
mechanisms. Mechanistic analysis has revealed that nearly all of these 
processes involve catalysis by ‘low-valent’ palladium (that is, palladium 
in the 0 (Pd(0)) or +2 (Pd(m1)) oxidation states). For many reactions, 
Pd(0) and Pd(m) catalytic intermediates have been well characterized 
and the steric and electronic influence of supporting ligands on catalysis 
has been studied in detail'*. Such mechanistic studies have been crucial 
in the development of new catalyst structures and novel transformations 
with wide scope and mild reaction conditions’*™. 

Although catalysis by low-valent palladium is ubiquitous and extremely 
synthetically useful, it has several important limitations that stem from 
the fundamental properties of organo-Pd() complexes. These include 
limited reactivity to forming certain important types of chemical bond 
(for example carbon-halogen and carbon-CF; linkages) and a high 
susceptibility to decomposition pathways such as B-hydride elimination. 
These challenges have motivated the study of catalysis by high-valent 
palladium as a potentially complementary mechanistic pathway. The first 
30 years of palladium chemistry were dominated by the use of low-valent 
palladium, but over the past decade the unique reactivity of Pd(m) and 
Pd(iv) intermediates has increasingly been recognized and exploited in 
catalysis. In the section on high-valent palladium below, we present 
recent advances in the field that demonstrate the relevance and utility 
of high-valent palladium in diverse catalytic applications (discussed in 
detail in Box 1). 


1University of Michigan, Department of Chemistry, 930 North University Avenue, Ann Arbor, Michigan 48109, USA. 
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REVIEW 


BOX | 
High-valent copper and palladium 


In the context of palladium and copper, high-valent compounds 
include palladium in the +3 or +4 oxidation state (Pd(!) and Pd(iv), 
respectively) and copper in the +3 oxidation state (Cu(lI))). 
Organometallic complexes of these high-valent metals are defined as 
molecules that contain copper-carbon or palladium-carbon bonds. 
Catalysis by high-valent copper or palladium is defined as a catalytic 
reaction in which the metal is oxidized to form a high-valent 
organometallic intermediate during the catalytic cycle. 

Advances in the chemistry of high-valent copper have enabled 
researchers to address a number of critical issues, including the 
synthetic accessibility of organo-Cu(ill) complexes, the viability of 
carbon-carbon and carbon-heteroatom bond formation from 
discrete organo-Cu(Il) species, the catalytic relevance of Cu(iI!) 
complexes and the ability to exploit high-valent copper intermediates 
to improve catalytic reactions and/or discover new reactivity modes. 

There has been a renaissance in the chemistry of high-valent 
palladium over the past decade that has provided key insights into the 
following issues: the synthetic accessibility of high-valent Pd(itl) or 
Pd (iv) organometallic complexes, the ability of these species to 
participate in stoichiometric carbon-carbon and carbon-heteroatom 
bond-forming reactions, the catalytic relevance of these high-valent 
palladium species and the advantages of high-valent palladium 
catalysis (in terms of enhanced substrate range, milder reaction 
conditions and improved chemoselectivity, regioselectivity and/or site 
selectivity) over more-common low-valent analogues. A representative 
high-valent palladium catalytic cycle is shown in the figure. 
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X, new functional group being introduced 


High-valent copper 
Before 2000, high-valent organometallic copper complexes were rare. 
The occasional examples reported in the literature were stabilized by 
rigid, chelating and/or perfluorinated ligands'*"'’, as exemplified by 
compounds 1-3 in Fig. la. Although complexes 1-3 are structurally 
interesting, they do not have the characteristic reactivity that has been 
attributed to Cu(u!) in catalysis. Specifically, they are all inert to carbon- 
carbon and carbon-heteroatom bond-forming reactions. As such, these 
compounds were largely considered curiosities, whose relevance to copper 
catalysis was tenuous at best. The past ten years have seen tremendous 
developments in this area with the observation and detailed investigation of 
catalytically relevant organo-Cu(im) species in both carbon-carbon and 
carbon-heteroatom bond formation. In this section, we will specifically 
focus on two representative areas: high-valent copper intermediates in 
carbon-carbon bond-forming reactions of organocuprates and high-valent 
copper intermediates in carbon-nitrogen and carbon-oxygen couplings. 
An early advance in the chemistry of high-valent copper came from 
investigations of carbon-carbon bond-forming reactions of organocuprates 
with enones, alkyl halides and allylic electrophiles. Computational studies 
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Figure 1 | High-valent copper complexes. a, Early examples of isolatable 
organometallic Cu(u) complexes. b, Cu(!) intermediates of organocuprate 
reactions detected at —100 °C using rapid-injection NMR spectroscopy. Et, 
ethyl; TMS, trimethylsilyl. 


of these transformations implicated a Cu(il) intermediate in the key 
carbon-carbon bond-forming step'**°. However, for many years little 
experimental evidence was available to support this hypothesis, as the 
putative Cu(im) compounds proved too transient for detection using 
standard spectroscopic techniques”. In 2007, rapid-injection NMR spec- 
troscopy (RI-NMR) was introduced as a method to directly observe 
Cu(iml) species such as 4-7 in real time (Fig. 1b). Remarkably, when 
generated using RI-NMR, the Cu(im) adducts could be detected and fully 
characterized at —100 °C (refs 22-26). Furthermore, on being warmed, 
these discrete organo-Cu(Iml) intermediates underwent carbon-carbon 
bond-forming reductive elimination”. 

This field is still in its infancy, but the ability to study directly the 
reactivity of high-valent organo-Cu(m!) species has begun to provide 
mechanistic insights of direct relevance to copper catalysis. For example, 
Lewis basic additives such as cyanide, phosphines, pyridines and amines 
have been known for decades to improve the yield and/or rate of 
organocuprate conjugate addition reactions””’*. Some researchers have 
proposed that the primary role of these additives is to enhance the 
solubility of copper starting materials or intermediates**. By contrast, 
other groups have speculated that these Lewis bases have a more 
intimate role in the reaction mechanism by binding to copper and 
tuning its reactivity towards oxidative addition and/or carbon-carbon 
bond-forming reductive elimination’””’. 

RI-NMR has provided a way of interrogating these possibilities directly. 
Aseries of Cu(1) complexes of general structure (CH3CH2)(CH3)2Cu(1m)LB, 
where LB denotes a Lewis basic ligand, were prepared using this technique 
and evaluated as models for conjugate addition intermediates”. The nature 
of LB was found to havea profound influence on the stability of these species. 
For instance, when LB was pyridine, the Cu(1m) complex 8 was a short-lived 
intermediate at — 100 °C (0.5 h to maximum concentration) (Fig. 2). At this 
temperature, 8 underwent facile ligand exchange to form the Cu(im) com- 
plex (CH3;CH2)(CH3)3Cu(m)Li, as well as carbon-carbon bond-forming 
reductive elimination to release propane and (CH3)3Cu(1)2Li. By contrast, 
under otherwise analogous conditions the dimethylaminopyridine complex 
9 was very stable at — 100 °C (Fig. 2). Minimal (<10%) reductive elimina- 
tion was detected, and ligand exchange to form (CH;CH2)(CH3)3;Cu(m)Li 
was not observed in this system. 

Reference 29 clearly shows that Lewis basic ligands drastically influ- 
ence the relative and absolute rates of carbon-carbon coupling at Cu(11) 
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Figure 2 | RI-NMR studies of the effect of Lewis bases on the reactivity of 
Cu(im) complexes 8 and 9. Path (i): pyridine-containing intermediate 8 is 
short lived and undergoes ligand exchange to form (CH3;CH2)(CH3)3Cu(m)Li 
as well as carbon-carbon bond-forming reductive elimination. Path (ii): by 
contrast, dimethylaminopyridine-containing intermediate 9 is stable at 

—100 °C under analogous conditions. 


centres. In the future, more-quantitative RI-NMR analysis of reaction 
kinetics and ligand electronic/steric effects should provide further insights 
about the rate- and selectivity-determining steps of conjugate addition 
and other organocuprate reactions. Such studies will also be invaluable in 
establishing the role of Cu(im) in the reactions of organocuprates with 
other electrophiles (for example acyl halides, carbonyl compounds and 
cyclopropanes). Furthermore, they will provide a mechanistic platform 
for rationally designing new synthetic methods in this area. 

The ability to use RI-NMR to observe and study organo-Cu(1) inter- 
mediates also has important implications for emerging areas of copper 
catalysis. For example, the copper-catalysed carbon-hydrogen arylation 
of anilides with diphenyliodonium _ trifluoromethanesulphonate, 
[Ph I]OTf, was recently reported*’. In these reactions, carbon-carbon 
bond formation occurs meta to the amide substituent, site selectivity 
that is complementary to palladium-catalysed, ligand-directed carbon- 
hydrogen arylation methods*’. The mechanism of this reaction remains 
controversial. Calculations using density functional theory have impli- 
cated the intermediacy of a Cu(1m)-phenyl complex”; however, the 
accessibility of such intermediates has yet to be confirmed experimentally. 
Alternative mechanisms such as Lewis acid catalysis are also plausible. 
RI-NMR would serve as a powerful technique for detecting Cu(im) (if 
present) during catalysis and/or for interrogating stoichiometric reactions 
of Cu(1) model complexes with [Ph2IJOTf. Such studies could help to 
clarify the mechanism of this novel transformation as well as to probe 
the origin of the meta selectivity. 

A second key advance in catalysis by high-valent copper came in the 
study of carbon-heteroatom bond formation from Cu(t) intermediates. 
As representative examples, we highlight recent investigations of the 
copper-catalysed amination of aryl bromides and of copper-catalysed 
carbon-hydrogen bond amination and oxygenation. Copper catalysts 
are well known to promote the amination of aryl boronic acids****, 
aryl halides***° and carbon-hydrogen bonds”. Common nitrogen 
heterocycles such as pyrazole, pyridone and phthalimide are particularly 
effective coupling partners, and the reactions often proceed under mild 
conditions. Many researchers have proposed the intermediacy of Cu(11) 
in these transformations'****. However, this hypothesis has been the 
subject of significant debate, and others have argued that single-electron 
transfer, halide atom transfer or o-bond metathesis mechanisms at low- 
valent Cu(1) or Cu(11) are more likely**“®. Until very recently, no Cu(i) 
catalytic intermediates had been detected, and carbon-nitrogen bond 
formation from a Cu(1l) complex had not been observed directly. 

The synthesis of the first isolatable Cu(i)-monoaryl species (10; 
Fig. 3a), in 2002, was a turning point for this field”. Like many of the 
early examples of organometallic Cu(iu) compounds (for example 2 and 
3 in Fig. 1), complex 10 is stabilized by an electron-donating macrocyclic 
ligand. However, unlike its predecessors 10 is remarkably reactive to 
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carbon-nitrogen bond formation. For example, 10 reacts stoichiometrically 
with amines such as pyridone, oxazolidinone and acetanilide to form 
aminated products” (11; Fig. 3a). Further study showed that less basic 
amines reacted faster, implicating deprotonation of the amine at or before 
the rate-determining step of this sequence. Notably, in the absence of 
amine, 13 was shown to undergo carbon-bromine bond-forming reductive 
elimination to release aryl bromide 12 when trifluoromethanesulphonic 
acid is present (Fig. 3b). 

The next key question was whether Cu(1m) complex 10 and analogues 
thereof are relevant to catalytic carbon-nitrogen coupling reactions. The 
copper-catalysed amination of aryl bromide 12 with pyridone was 
studied to test this possibility’. Remarkably, it was possible to detect a 
steady-state concentration of Cu(im) intermediate 13 during catalysis by 
means of ultraviolet-visible and NMR spectroscopy (Fig. 3b). The 
observation of 13 is consistent with its participation in the turnover- 
limiting step of the catalytic reaction. Although studies of this one system 
do not resolve the controversy surrounding the mechanism of all copper- 
catalysed cross-coupling reactions, they provide the first definitive demon- 
stration that Cu(m) can be catalytically relevant in these transformations. 

Pioneering studies have also recently established a role for high- 
valent copper in certain copper-catalysed carbon—hydrogen bond- 
functionalization reactions*”**. Before these studies, there was considerable 
mechanistic uncertainty about these transformations’****’. Radical 
pathways initiated by single-electron transfer from amine, enolate and 
electron-rich arene substrates have frequently been proposed***’. 
However, a growing number of examples have been reported with sub- 
strates (for example alkynes*”*' and electron-deficient arenes********) 
that are unlikely to participate in such a mechanism. Very recently, the 
catalytic aerobic carbon-hydrogen oxygenation of macrocycle 14 with 
methanol (Fig. 3c) was demonstrated*’. In situ ultraviolet-visible spec- 
troscopic studies revealed the build-up and subsequent decay of Cu(i) 
complex 13 during the catalytic reaction, implicating this species as a 
catalytically relevant intermediate. Further kinetic studies suggested that 
the rates of Cu()-mediated carbon-hydrogen cleavage and of carbon- 
oxygen bond formation from Cu(im) are closely matched, which would 
explain the observed concentration profile of intermediate 13 during 
catalysis’. 

In summary, ten years ago little was known about the stability and 
reactivity of high-valent copper complexes. Since then there has been 
considerable progress, including the first observation and study of 
carbon-carbon and carbon-heteroatom bond formation from discrete 
organo-Cu(III) species in stoichiometric and catalytic transformations. 
Fundamental studies of organo-Cu(t) are beginning to provide greater 
understanding of mechanism, which in turn should allow the rational 
development of new synthetic methods”. 


High-valent palladium 

Sporadic reports over the past 50 years have proposed the intermediacy of 
Pd(1v) in catalysis**-°’. However, these proposals were frequently viewed 
with scepticism owing to a lack of evidence supporting the viability of 
such species. Thus, a first key challenge was to determine whether it was 
possible to form, detect and isolate Pd(1m) and/or Pd(1v) complexes from 
the reactions of Pd(1) precursors with oxidants. Early work established 
the viability of this approach and demonstrated that electron-donating, 
rigid, multidentate supporting ligands can be used to stabilize high-valent 
palladium products. For example, in 1988 organometallic Pd(1v) complex 
16 (Fig. 4) was prepared through the reaction of 15 (containing the rigid, 
bidentate 2,2’-bipyridine ligand) with CH3I (ref. 58). Similarly, organo- 
Pd(m) dimer 18 was produced by reacting 17 (containing bidentate, 
electron-donating cyclometalated phosphines) with PhICl, (ref. 59). 
These seminal discoveries have inspired extensive efforts to exploit 
related intermediates in catalysis. In this section, we specifically focus 
on progress in two areas: high-valent palladium intermediates in carbon- 
halogen bond formation and high-valent palladium intermediates in 
trifluoromethylation reactions. 
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Figure 3 | High-valent copper complexes involved in carbon-heteroatom 
bond formation. a, Stoichiometric carbon-nitrogen bond formation from an 
isolated organo-Cu(!). NHR, represents pyridone, oxazolidinone or 
acetanilide. b, In situ observation of an organo-Cu(iI1) intermediate in the 


The formation of carbon-halogen bonds has been an important 
target reaction for catalysis by high-valent palladium. Halogenated 
molecules are valuable starting materials for many organic transforma- 
tions including nucleophilic substitutions, metal-catalysed cross- 
couplings and Friedel-Crafts alkylations. Notably, carbon-halogen 
bond-forming reductive elimination is both thermodynamically 
unfavourable and kinetically slow from most Pd() complexes”. As a 
result, most Pd(1)- or Pd(0)-catalysed transformations of aryl or alkyl 
halides involve breaking carbon-halogen bonds rather than forming 
them®' (Fig. 5a). By marked contrast, recent work has shown that 
many high-valent palladium complexes promote the facile formation 
of carbon-halogen bonds®*®. Initial studies in this area focused on 
generating high-valent organometallic palladium halide complexes 
through the stoichiometric two-electron oxidation of Pd(1) precursors 
with electrophilic halogenating reagents®* (for example Cl, PhIChL, 
N-chlorosuccinimide and XeF, (collectively abbreviated X* in Fig. 5a)). 
Depending on the structure of the Pd(m) starting material, these 
reactions afford either monomeric Pd(iv) complexes such as 19 
(ref. 68), 20 (ref. 69) and 21 (ref. 70) or dimeric Pd(1) species such as 
22 (ref. 71) (Fig. 4). Many of these high-valent palladium compounds 
are isolatable at room temperature (25 °C). However, on being heated, 
they all undergo kinetically fast and highly thermodynamically favourable 
carbon-heteroatom bond-forming reductive elimination to release 
halogenated organic products (Fig. 5a). 

The stoichiometric studies shown in Fig. 5a have informed the develop- 
ment of new palladium-catalysed halogenation reactions that involve 
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coupling of aryl bromide 12 with pyridone. Tf, trifluoromethanesulphonate. 
c, In situ observation of an organo-Cu(ill) intermediate in the oxygenation of 
carbon-hydrogen bonds. Me, methyl. 


high-valent intermediates. One particularly well-studied example involves 
the ligand-directed halogenation of arene and alkane carbon-hydrogen 
bonds®**”*” (Fig. 5b). Electrophilic halogenating reagents (X*) are used 
to promote the formation of high-valent palladium intermediates during 
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Figure 4 | Early examples of Pd(11) and Pd(1v) organometallic complexes. 
Ph, phenyl. 
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Figure 5 | High-valent palladium complexes involved in carbon-halogen 
bond formation. a, Carbon-halogen bond-forming reductive elimination is 
thermodynamically unfavourable from most Pd(1) species but not from high- 
valent palladium complexes such as 19-22. Ac, acetyl; Keg, equilibrium 


catalysis. Depending on the structure of the reagent, diverse carbon- 
halogen bonds can be formed. For example, N-fluoropyridinium salts 
generate carbon-fluorine bonds, N-chlorosuccinimide promotes the 
formation of carbon-chlorine bonds and acetyl hypoiodite provides 
access to iodinated products. 

Detailed mechanistic studies of the palladium-catalysed carbon- 
hydrogen chlorination of benzo[h]quinoline with N-chlorosuccinimide 
support the intermediacy of a high-valent palladium species’! (Fig. 6a). 
The resting state of the catalyst was determined to be the dinuclear 
succinate-bridged Pd(i1) complex 23 (Fig. 6b). This compound is a 
kinetically competent catalyst in the presence of added acetate. 
Furthermore, rate studies of the 23-catalysed carbon-hydrogen 
chlorination reaction show a first-order dependence on this Pd()- 
Pd() dimer and a first-order dependence on the oxidant. These data 
are consistent with rate-limiting two-electron oxidation of 23 to generate 
the high-valent Pd(m)-Pd(im) dimer 22 (Fig. 6b). This Pd(1m)-Pd(m1) 
intermediate could not be observed under the catalytic conditions, as 
to be expected when oxidation is rate limiting. However, dimer 22 could 
be synthesized independently at —78 °C. On being warmed to 23 °C, 22 
underwent carbon-chlorine bond-forming reductive elimination to 
release chlorinated product 24 in 84% yield, further supporting its 
intermediacy in catalysis. 

High-valent palladium catalysis has also been exploited for the 
halofunctionalization of alkenes’*’”° (that is, the addition of a halogen 
and another functional group), as exemplified by Fig. 6c. The combina- 
tion of a palladium catalyst, an alkene and an aryl-metal reagent such as 
Bu3SnPh is well known to produce a Pd(m1)-alkyl intermediate like 25 
(Fig. 6c). However, the fate of this intermediate and the ultimate organic 
product of the reaction vary drastically depending on the choice of 
oxidant. For example, with oxidants such as dioxygen (which has low 
kinetic reactivity with most Pd(11) complexes), 25 undergoes f-hydride 
elimination to release styrene product 26 through a conventional low- 
valent Pd(u) or Pd(0) manifold” (Fig. 6c, reaction (i)). By contrast, 
kinetically reactive Cl* oxidants such as PhICI, can rapidly intercept 
25 to generate putative high-valent palladium intermediates. These can 
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Figure 6 | Palladium-catalysed chlorination. a, Complex 23 is an efficient 
catalyst for the palladium-catalysed carbon-hydrogen chlorination of 
benzo[h] quinoline. b, The rate-determining step (r.d.s.) of this palladium- 
catalysed reaction is oxidation of 23 by N-chlorosuccinimide to form the 
Pd(1)—Pd(i) dimer 22. c, High-valent palladium-catalysed 1,2- 
arylchlorination (ii) is complementary to low-valent palladium-catalysed 
reactions (i) of «-alkenes. 
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then undergo carbon-chlorine bond-forming reductive elimination to 
release 1,2-arylchlorinated compound 27 (ref. 76; Fig. 6c, reaction 
(ii)). Although the intermediacy of Pd(m1) and/or Pd(iv) in these 
arylchlorination reactions has not yet been definitively confirmed, the 
observed reactivity (favouring carbon-chlorine bond formation over 
B-hydride elimination) is consistent with such a mechanism. This 
highlights another key complementarity between catalysis by low-valent 
palladium (where square-planar Pd(m)-alkyl intermediates typically 
undergo fast decomposition through -hydride elimination) and 
catalysis by high-valent palladium’ *”* (where B-hydride elimination 
is disfavoured owing to the lack of open coordination sites at octahedral 
Pd(m1)- and/or Pd(tv)-alkyl complexes). Notably, many related palladium- 
catalysed alkene difunctionalization reactions have been reported over 
the past six years that also probably proceed through high-valent 
palladium pathways. 

Another challenging and desirable chemical target for catalysis by 
high-valent palladium has been the generation of carbon-CF; linkages. 
Trifluoromethyl groups appear in numerous commercial pharmaceuticals 
and drug candidates and can drastically enhance the metabolic stability and 
bioavailability of biologically active molecules’*. Despite the prevalence of 
these groups in medicinal chemistry, efficient approaches to introducing 
CF; into organic compounds under mild conditions are limited. Methods 
involving metal catalysts are particularly rare””*° and would be powerful 
synthetic tools to complement currently available chemical processes. 

Many previous efforts to develop catalytic trifluoromethylation 
reactions have been hampered by the kinetic inertness of most metal- 
CF; complexes to carbon-CF; bond formation”. For example, carbon- 
CF; coupling at Pd(m) centres requires specialized phosphine ligands 
to proceed efficiently”* (Fig. 7a). By contrast, Pd(tv) complexes 
containing simple bidentate, nitrogen donor ligands (N~N) undergo 
facile carbon-CF; bond-forming reductive elimination. For example, 
the stoichiometric reaction of (N~N)Pd()(aryl)CF; complexes 
with N-fluoropyridinium oxidants affords isolatable high-valent 


(N~N)Pd(iv)(aryl)CF3 intermediates. These compounds participate 
in rapid carbon-CF; coupling at temperatures as low as 25 °C (ref. 81). 
A related approach has been used to achieve catalytic ligand-directed 
trifluoromethylation of aromatic carbon-hydrogen bonds. In this system, 
electrophilic trifluoromethylating reagents (CF3") were used to promote 
the formation of high-valent palladium intermediates, which decompose 
to afford aryl-trifluoromethylated products*’. Remarkably, these reactions 
proceed efficiently with simple palladium salts as catalysts, and no 
external ligands (other than substrate) are required. Subsequent 
mechanistic studies suggested that the Pd(iv) complex 28 might be a 
catalytic intermediate*’ (Fig. 7b), as it serves as a kinetically competent 
catalyst under the reaction conditions. This methodology represents a 
transformation (ligand-directed C-H — C-CF; conversion) that is not at 
present accessible using any other transition-metal catalyst, again high- 
lighting the power of high-valent palladium to do novel chemistry. 

In addition to the carbon-halogen and carbon-trifluoromethyl bond- 
forming reactions discussed above, high-valent palladium intermediates 
have also been implicated in the selective transformation of alkane and 
arene carbon-hydrogen bonds into carbon-oxygen, carbon-carbon, 
carbon-nitrogen and carbon-sulphur linkages. Detailed mechanistic 
investigations of catalytic carbon-hydrogen acetoxylation™ and aryla- 
tion*' have provided evidence consistent with the formation of high- 
valent palladium intermediates in these reactions as well. 

There has been much recent progress in high-valent palladium cata- 
lysis. Over the past decade, numerous organometallic Pd(1v) and Pd(i) 
complexes have been synthesized by the reaction of Pd(11) starting 
materials with strong oxidants. A wide range of carbon-carbon and 
carbon-heteroatom bond-forming reductive elimination reactions can 
be achieved from these species, and the selectivity, reactivity and mechan- 
isms of these transformations have been studied in detail. Furthermore, a 
number of these species have been shown to be kinetically competent 
catalysts for carbon-hydrogen bond halogenation, trifluoromethylation 
and other reactions. These results have firmly established the feasibility 
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Figure 7 | Palladium-catalysed trifluoromethylation. a, Carbon-CF; bond 
formation from Pd(11) requires specialized phosphine ligands. TES, triethylsilyl; 
Cy, cyclohexyl; i-Pr = iso-propyl. b, Using a high-valent palladium strategy, 
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catalytic carbon-hydrogen trifluoromethylation has been developed through 


putative Pd(1v) intermediate 28. 
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and synthetic utility of high-oxidation-state palladium catalysis in 
organic synthesis, and the field shows great potential for the future. 


Comparison and contrast 

As discussed here, high-valent copper chemistry has several features 
in common with high-valent palladium chemistry. First, although for 
each metal the specific transformations that have been selected for 
detailed mechanistic investigation vary significantly, Cu(im) and 
Pd(m1)/Pd(1v) species have been shown to participate in closely related 
carbon-carbon and carbon-heteroatom bond-forming reductive 
elimination reactions. One particularly striking example is the accessibility 
of carbon-halogen bond-forming reductive elimination from Cu(m) 
complex 13 as well as from Pd(1v)/Pd(m) complexes 19-22. Similar 
ligand environments have been shown to stabilize high-valent complexes 
of both metals. In particular, rigid multidentate ligands (such as the 
macrocycles of copper complexes 2, 10 and 13 and the cyclometalated 
benzo[h]quinoline of Pd(1v) species 21 and 28 and Pd(m) complex 22) 
tend to slow competing reductive elimination processes. Furthermore, 
the presence of multiple highly electron-donating o-aryl or o-alkyl 
ligands (as in Cu(11) complexes 4-9 and Pd(1v) complexes 16 and 19- 
21) facilitate the detection or isolation of high-valent species of both 
copper and palladium. 

Recent examples of copper- or palladium-catalysed oxidation reactions 
reveal additional intriguing similarities. There are a multitude of catalytic 
carbon-carbon and carbon-heteroatom coupling reactions that share the 
following features: a copper or palladium catalyst, an oxidant and an 
organic substrate that is a precursor to a metal-carbon bond (such as 
an aryl halide, a carbon—-hydrogen bond or a transmetalating reagent). 
Three examples of such transformations are shown in Fig. 8. Although 
detailed mechanistic analysis will be required to establish firmly the 
pathway for each system, it seems likely that many (if not all) of these 
reactions proceed through high-valent copper or palladium manifolds. 
In the first example, copper and palladium catalyse the same overall 
reaction, the ligand-directed carbon-hydrogen acetoxylation of 
2-phenylpyridine**** (Fig. 8a). In the second example, the same oxidant 
(S-(trifluoromethyl) dibenzothiophenium) is used to effect the trifluor- 
omethylation of two different organic substrates***° (Fig. 8b). Finally, in 
the third example, both metals catalyse the carbon-hydrogen arylation of 
indole with diaryliodonium salts*”* (Fig. 8c). 
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Figure 8 | Oxidative bond-forming reactions catalysed by copper and 
palladium. These reactions exemplify similarities and differences between 
copper and palladium. 
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The reactions in Fig. 8 not only illustrate key similarities but also 
highlight key differences or complementarities between the oxidative 
chemistry of palladium and that of copper. For example, the palladium- 
catalysed carbon-hydrogen acetoxylation of 2-phenylpyridine (Fig. 8a) 
requires the use of PhI(OAc), as the terminal oxidant*’. This reagent is 
quite expensive and generates an equivalent of iodobenzene waste with 
each catalytic turnover. By marked contrast, the copper-catalysed 
acetoxylation uses abundant and atom-economical dioxygen as the 
oxidant”. The ability to generate high-valent copper using dioxygen is 
currently a distinct advantage of high-valent copper catalysis**’*°?. 
Although dioxygen is thermodynamically capable of oxidizing Pd(1) to 
Pd(iv), most organo-Pd(11) intermediates are kinetically inert to oxida- 
tion by dioxygen’’. However, two recent reports have shown that aerobic, 
palladium-catalysed, ligand-directed carbon-hydrogen oxygenation is 
possible (potentially through high-valent palladium intermediates), sug- 
gesting a promising future in this area*””®. 

As shown in Fig. 8b, both copper and palladium catalyse carbon—CF; 
bond-forming reactions with S-(trifluoromethyl) dibenzothiophenium*”*. 
Catalytically competent Pd(1v) intermediates have been observed and iso- 
lated in the carbon-hydrogen trifluoromethylation reaction® (Fig. 7b). By 
contrast, mechanisms involving Cu(m)(aryl)CF3 intermediates have been 
proposed but remain to be confirmed experimentally for the copper- 
catalysed trifluoromethylation of boronic acids**. These two examples 
demonstrate another key complementarity between palladium and 
copper catalysis. High-valent palladium catalysis has been used to trans- 
form carbon-hydrogen substrates into many different functional groups 
(with trifluoromethyl being just one example), and palladium-catalysed 
carbon-hydrogen oxidation is an extremely common, general, well- 
studied reaction’’. By marked contrast, high-valent copper catalysis has 
predominantly focused on prefunctionalized substrates such as aryl 
boronic acids and aryl halides (Fig. 8b). At present, copper-catalysed 
carbon-hydrogen bond oxidation reactions are comparatively rare and 
the range of possible substrates is significantly narrower than that of 
analogous palladium-catalysed reactions*’*’. For example, whereas 
the copper-catalysed functionalization of unactivated alkane carbon- 
hydrogen bonds remains highly challenging, such transformations are 
increasingly common using palladium’’. The development of more- 
robust and general methods for carbon-hydrogen bond oxidation 
through catalysis by high-valent copper is likely to be a major thrust of 
research in this field. 

Finally, both copper and palladium catalyse the arylation of indole 
with diaryliodonium salts; however, the site selectivities of these two 
reactions are orthogonal (Fig. 8c). Whereas the palladium-catalysed 
reaction results in selective arylation at the 2-position®’, the copper- 
catalysed methods can be tuned to give exclusive arylation at the 
3-position®*. Site selectivity is one of the most difficult challenges in 
the field of carbon-hydrogen functionalization. As such, the ability to 
tune selectivity as a function of the metal is of great potential synthetic 
utility. Thus far, neither of these transformations has been the subject of 
detailed mechanistic analysis, but the generation of Cu(1m) and Pd(1v) 
intermediates has been suggested in both cases. In combination, the 
examples in Fig. 8 demonstrate the tremendous opportunities available 
in the concurrent development of high-valent copper and palladium 
catalysis. 


Looking forward 


The fields of high-valent palladium and copper chemistry are sure to have 
a bright and rapidly expanding future. It will be critical to increase our 
understanding of, and enhance the chemo-, regio- and stereoselectivity 
of, catalytic processes involving high-valent copper and palladium inter- 
mediates. In many cases, the coordination sphere of these high-valent 
metal centres contains multiple possible partners for reductive bond- 
forming reactions. The ability to control the chemoselectivity of the 
bond-forming event is of central importance in achieving efficient, 
high-yield catalytic transformations. In addition, the identification of 
chiral ligands that are compatible with catalysis by high-valent copper 
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and/or palladium could potentially allow novel asymmetric conjugate 
addition, aryl-aryl coupling, carbon—hydrogen oxidation and/or alkene 
difunctionalization reactions, which would all be of great value for 
organic synthesis. 

Additional future work in this field will focus on expanding the scope 
of the fundamental organometallic reactions that are possible at high- 
valent copper and palladium centres. Despite the impressive progress 
described above, the synthetic power of high-valent organometallic 
intermediates has thus far been explored quite narrowly, with an almost 
exclusive focus on reductive bond-forming reactions. We anticipate that 
the design of new ancillary ligands that even better stabilize high-valent 
palladium and copper will facilitate the study and application of carbon- 
hydrogen activation, o-bond metathesis, migratory insertion and 
nucleopalladation reactions at these metal centres. Such reactions could 
potentially proceed with novel patterns of reactivity and selectivity rela- 
tive to analogous transformations of low-valent analogues. For example, 
several preliminary reports have suggested that carbon-hydrogen 
activation occurs with completely different site selectivities at Pd(1v) 
centres than at Pd(11) centres?!~”*. 

Finally, a number of recent reports suggest that high-valent organo- 
metallic complexes of other late transition metals can catalyse reactions 
similar to those discussed for Cu(i) and Pd(im)/Pd(iv) above. For 
example, complexes of nickel(11), nickel(rv), silver(11) and silver(im) have 
been observed and/or implicated in carbon-halogen and carbon- 
nitrogen bond-forming processes”*"'®°. Further exploration of these is 
likely to uncover many additional applications for high-valent late 
transition metals in catalysis. 
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Emerging fungal threats to animal, plant 
and ecosystem health 


Matthew C. Fisher, Daniel. A. Henk!, Cheryl J. Briggs’, John S. Brownstein®, Lawrence C. Madoff*, Sarah L. McCraw? 


& Sarah J. Gurr? 


The past two decades have seen an increasing number of virulent infectious diseases in natural populations and managed 
landscapes. In both animals and plants, an unprecedented number of fungal and fungal-like diseases have recently 
caused some of the most severe die-offs and extinctions ever witnessed in wild species, and are jeopardizing food 
security. Human activity is intensifying fungal disease dispersal by modifying natural environments and thus creating 
new opportunities for evolution. We argue that nascent fungal infections will cause increasing attrition of biodiversity, 
with wider implications for human and ecosystem health, unless steps are taken to tighten biosecurity worldwide. 


recognized as presenting a worldwide threat to food security’? 

(Table 1 and Supplementary Table 1). This is not a new problem 
and fungi have long been known to constitute a widespread threat to plant 
species. Plant disease epidemics caused by fungi and the fungal-like 
oomycetes have altered the course of human history. In the nineteenth 
century, late blight led to starvation, economic ruin and the downfall of 
the English government during the Irish potato famine and, in the 
twentieth century, Dutch elm blight and chestnut blight laid bare urban 
and forest landscapes. The threat of plant disease has not abated, in fact it 
is heightened by resource-rich farming practices and exaggerated in the 
landscape by microbial adaptation to new ecosystems, brought about by 
trade and transportation’, and by climate fluctuations*”. 

However, pathogenic fungi (also known as mycoses) have not been 
widely recognized as posing major threats to animal health. This per- 
ception is changing rapidly owing to the recent occurrence of several 
high-profile declines in wildlife caused by the emergence of previously 
unknown fungi®’. For example, during March 2007, a routine census of 
bats hibernating in New York State revealed mass mortalities*. Within a 
group of closely clustered caves, four species of bats were marked by a 
striking fungus growing on their muzzles and wing membranes, and the 
name ‘white nose syndrome’ (WNS) was coined. After the initial out- 
break, the ascomycete fungus Geomyces destructans was shown to fulfil 
Koch’s postulates and was described as the cause of WNS in American 
bat species”'®. Mortalities exhibiting WNS have subsequently been 
found in an increasing number of bat overwintering sites and, by 
2010, the infection was confirmed to have emerged in at least 115 roosts 
across the United States and Canada, spanning over 1,200 km (ref. 11). 
Bat numbers across affected sites have declined by over 70% and ana- 
lyses have shown that at least one affected species, the little brown bat 
Myotis lucifugus, has a greater than 99% chance of becoming locally 
extinct within the next 16 years (ref. 11). Other species of bats across 
this region are declining as a consequence of this infection, and the 
prognosis for their survival and their role in supporting healthy ecosys- 
tems, is poor’’. 

Cases of this sort are no longer perceived to be atypical. The probability 
of extinction is increasing for some species of North American bats, but 
another fungal infection has caused the greatest disease-driven loss of 
biodiversity ever documented. The skin-infecting amphibian fungus 


E merging infectious diseases (EIDs) caused by fungi are increasingly 


Batrachochytrium dendrobatidis was discovered in 1997 (ref. 13) and 
named in 1999 (ref. 14). B. dendrobatidis has been shown to infect over 
500 species of amphibians in 54 countries, on all continents where 
amphibians are found'*"*, and is highly pathogenic across a wide diversity 
of species. Studies using preserved amphibian specimens showed that the 
first appearance of B. dendrobatidis in the Americas coincided with a wave 
of population declines that began in southern Mexico in the 1970s and 
proceeded through Central America to reach the Panamanian isthmus in 
2007 (ref. 17). As a consequence of the infection, some areas of central 
America have lost over 40% of their amphibian species’®, a loss that has 
resulted in measurable ecosystem-level changes’’. This spatiotemporal 
pattern has been broadly mirrored in other continents'’, and ongoing 
reductions in amphibian diversity owing to chytridiomycosis have 
contributed to nearly half of all amphibian species being in decline 
worldwide”. 

Fungal infections causing widespread population declines are not 
limited to crops, bats and frogs; studies show that they are emerging 
as pathogens across diverse taxa (Table 1), including soft corals (for 
example, sea-fan aspergillosis caused by Aspergillus sydowii)”', bees 
(the microsporidian fungus Nosema sp. associated with colony collapse 
disorder)”, and as human and wildlife pathogens in previously non- 
endemic regions (for example, the emergent virulent VGII lineage 
of Cryptococcus gattii in the northwest America®’ and Cryptococcus 
neoformans across southeast Asia**). The oomycetes have life histories 
similar to those of fungi and are also emerging as aggressive pathogens of 
animals, causing declines in freshwater brown crayfish (for example, the 
crayfish plague caused by Aphanomyces astaci)”, Tilapia fish (for 
example, epizootic ulcerative syndrome caused by A. invadans)*® and 
many species of plants’”’*. Although the direct causal relationship is 
uncertain in some of these diverse host-pathogen relationships, it seems 
that pathogenic fungi are having a pronounced effect on the global 
biota’. 


Increasing risk of biodiversity loss by Fungi 

For infectious disease systems, theory predicts that pathogens will co- 
evolve with, rather than extirpate, their hosts”*°. Such evolutionary 
dynamics mirror population-level processes in which density depend- 
ence leads to the loss of pathogens before their hosts are driven extinct*’. 
For these reasons, infection has not been widely acknowledged as an 
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Table 1 | Major fungal organisms posing threats to animal and plant species. 
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Host 


Pathogen (Phylum) 


Disease dynamics leading to mass mortality in animal and plant hosts 


Amphibian species 
(for example, the 
common midwife 
toad, Alytes 
obstetricans) 


Rice (Oryza sativa); 
Magnaporthe grisea 
species complex on 
50 grass and sedge 
species, including 
wheat and barley 


Bat spp. (little brown 
bats, Myotis lucifugus) 


Wheat (Triticum 
aestivum); 28 Puccinia 
graminis f. tritici 
species, but P. graminis 
is found on 365 cereal 
or grass species 


Coral species (for 
example, the sea fan, 
Gorgonia ventalina) 


Bee species (for 
example, the hive of 
the domestic 
honeybee (Apis 
mellifera) suffering 
colony collapse 
disorder) 


Sea turtle species 
(the loggerhead 
turtle, Caretta caretta) 


Batrachochytrium 
dendrobatidis 
(Chytridiomycota) 


Magnaporthe oryzae 
(Ascomycota) 


Geomyces destructans 
(Ascomycota) 


Puccinia graminis 
(Basidiomycota) 


Aspergillus sydowii 
(Ascomycota) 


Nosema species 
(Microsporidia) 


Fusarium solani 
(Ascomycota) 


Worldwide dispersal of a hypervirulent lineage by trade. 
Ultra-generalist pathogen manifesting spillover between 
tolerant/susceptible species. Extent of chytridiomycosis is 
dependent on biotic and abiotic context!> . 


Rice blast disease in 85 countries, causing 10-35% loss of 
harvest. Global blast population structure determined by 
deployment of seeds with inbred race-specific disease 
resistance (RSR). Invasions occur by ‘host hops’ and altered 
pathogen demographics. 


New invasion of North American bat roosts occurred in 
approximately 2006, and disease is spreading rapidly®. 
Pathogen reservoir may exist in cave soil. Disease is more 
aggressive compared to similar infections in European bats, 
possibly owing to differences in roosts and host life 
histories®. 


Wheat stem rust is present on six continents. Population 
structure is determined by deployment of RSR cultivars 
and long-distance spread of aeciospores. Strain Ug99 
poses a notable threat to resistant wheat varieties, causing 
up to 100% crop loss. 


Sea-fan aspergillosis caused by a common terrestrial soil 
fungus?!®°, Epizootics are associated with warm- 
temperature anomalies. Coral immunosuppression is 
probably a factor causing decline. 


Microsporidian fungal infections are associated with colony 
collapse disorder and declining populations. Pathogen 
prevalence is probably a part of a multifactorial 
phenomenon that includes environmental stressors and 
polyparasitism®”®°. 


Soil-dwelling saprotroph and phytopathogenic fungus. 
Infection causes hatch failure in loggerhead turtle nests and 
suboptimal juveniles**. The disease dynamics fulfil Koch’s 
postulates. Environmental forcing is suspected but not 
proven. 


Images in Table 1, with permission: A. obstetricans chytridiomycosis mortalities, M.C.F.; M. oryzae, N. Talbot; WNS-affected little brown bats, A. Hicks; P. graminis, R. Mago; G. ventalina infected with A. sydowii, 
D. Harvell; A. mellifera hive suffering from colony collapse disorder, J. Evans; sea turtle eggs infected with F. solani, J. Dieguez-Uribeondo and A. Marco. 
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extinction mechanism owing to such intrinsic theoretical biotic limita- 
tions**. Inspection of species conservation databases would seem to 
confirm this idea. The International Union for Conservation of 
Nature (IUCN) red list database details threats to species worldwide, 
and analysis of the database has shown that of the 833 recorded species 
extinctions, less than 4% (31 species) were ascribed to infectious 
disease’. Ecological studies on host-pathogen relationships support this 
finding by showing that lower parasite richness occurs in threatened 
host species, suggesting that parasite decline and ‘fade out’ occurs when 
hosts become rare**. Therefore, given that macroevolutionary and eco- 
logical processes should promote diversity and prevent infectious dis- 
eases from driving their host species to extinction, we posed the question 
of whether we are witnessing increasing disease and extinction events 
driven by fungi on an increasingly large scale, or, alternatively, if there is 
evidence that a reporting bias has skewed our opinion of the current 
level of threat. 

EIDs are those pathogens that are increasing in their incidence, geo- 
graphic or host range, and virulence****. Current attempts to detect EID 
events centre on capturing changes in the patterns of disease alerts 
recorded by disease monitoring programmes. ProMED (the Program 
for Monitoring Emerging Diseases; http://www.promedmail.org) and 
HealthMap (http://healthmap.org) have two approaches for detecting 
and monitoring outbreaks worldwide in plant and animal hosts: first, by 
active reporting of disease outbreaks, and second, by capturing diverse 
online data sources. To ascertain whether there are changing patterns of 
fungal disease, we reviewed all disease alerts in ProMED (1994-2010) 
and HealthMap (2006-10) for combinations of search terms to cata- 
logue fungal alerts. We then classified these according to their relative 
proportion against the total number of disease alerts, and discriminated 
between plant- or animal-associated fungal pathogens (Supplementary 
Table 2). We also searched the primary research literature for reports in 
which EIDs have caused host extinction events, either at the regional 
scale (extirpations) or globally (Supplementary Table 3). These analyses 
show a number of positive trends associated with infectious fungi. 
Overall, fungal alerts comprise 3.5% of the ~38,000 ProMED records 
screened. However, over the period from 1995 to 2010, the relative 
proportion of fungal alerts increased in the ProMED database from 
1% to 7% of the database total (Fig. la and Supplementary Table 2). 
This trend is observed for both plant-infecting (0.4% to 5.4%) and 
animal-infecting (0.5% to 1.4%) fungi. HealthMap shows a recent 
(2007-11) positive trend in the proportion of records of fungi infecting 
animals (0.1% to 0.3%) and plants (0.1 to 0.2%), and fungal disease alerts 


were shown to occur worldwide (Fig. 1b). Web of Science literature 
searches and compilation of previous meta-analyses of infection-related 
species extinction and regional extirpation events show that fungi com- 
prise the highest threat for both animal-host (72%) and plant-host 
(64%) species (Fig. 1c and Supplementary Tables 3 and 4). This effect 
is more pronounced for animal hosts (39 animal species affected versus 
4 plant species); moreover, there is a notable increase in host loss during 
the second half of the twentieth century, driven mainly by the emergence 
of B. dendrobatidis (Fig. 1d). This effect is moderated after correcting for 
mass-species loss in regions of high epizootic loss (such as the mass 
extirpations of amphibians in Central America). However, fungi remain 
the major cause (65%) of pathogen-driven host loss after this correction. 
Our estimates are probably conservative owing to the cryptic nature of 
most disease-driven species impacts. For example, the lack of disease- 
related IUCN red list records is due to a lack of baseline data on the 
incidence of pathogens in natural systems compounded by inadequate 
disease diagnostics, reporting protocols and a lack of centralized record- 
ing mechanisms. Hence, the true numbers of extinctions and extirpa- 
tions caused by fungi and oomycetes are likely to be greater as we have 
not been able to categorize the probably high levels of species loss in 
major plant (such as the Phytophthora dieback in Australia caused by 
Phytophthora cinnamomi; Supplementary Table 3) or animal outbreaks 
(for example, the effects of B. dendrobatidis emergence in the American 
wet tropics). We cannot discount the idea that sampling bias owing to 
increasing awareness of pathogenic fungi as EIDs may contribute to the 
patterns that we document. However, because of our observation that 
increases in the amount of disease caused by fungi are seen across many 
sources of data, including disease alerts, the peer-reviewed literature and 
previously noted patterns in human fungal EIDs”, we believe that these 
trends are real. Therefore, the answer to our question seems to be that 
the data do indeed support the idea that fungi pose a greater threat to 
plant and animal biodiversity relative to other taxonomic classes of 
pathogen and hosts, and that this threat is increasing. 


Fungal-disease dynamics leading to host extinction 

Here we illustrate several key biological features of fungi that contribute 
to the epidemiological dynamics underlying contemporary increases in 
disease emergence and host extinction (Box 1). 


High virulence 
Fungi, like some bacterial and viral infections, can be highly lethal to 
naive hosts with rates of mortality approaching 100% (for example, 
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BOX | 
Modelling host extinctions caused 
by pathogenic fungi 


A simple susceptible—-infected model shows that the presence of a 
threshold host population size for disease persistence does not 
prevent host extinction during a disease outbreak, especially in cases 
in which a lethal pathogen invades a large host population. In a large 
host population transmission is rapid and all hosts can become 
infected before the host population is suppressed below the threshold. 
The model follows the dynamics of susceptible (S) and infected (/) 
hosts during the short time duration of an epidemic (deaths that are 
not due to disease, and births, are ignored): dS/dt = — BSI; di/ 

dt= BSI — al, where f is the pathogen transmission rate, and « is the 
disease-induced death rate. For the parameters shown in Fig. 2a, the 
threshold population size below which the pathogen has a negative 
growth rate is N+ = 20 individuals. Figure 2a shows that large host 
populations are rapidly driven extinct, but only a fraction of individuals 
are killed in small host populations. 

Pathogens with a long-lived infectious stage have an increased 
potential to cause host extinction. In this model, the disease is 
transmitted through contact between susceptible hosts and free-living 
infectious spores (2), resulting in infected hosts, /: dS/dt = —BSZ; dl/ 
dt = BSZ — al; dZ/dt = ¢! — uZ — BNZ, where f is the transmission rate, 
a is the pathogen-induced death rate and ¢ is the rate of release of 
spores from infected hosts. Figure 2b shows that fraction of hosts 
killed in a disease outbreak increases with the duration of the free- 
living infectious spore stage (1/n, where yw is the spore mortality rate). 

Saprophytic growth by a pathogen can lead to extinction of the host, 
and even allow the pathogen to persist in the absence of its host. In this 
model, free-living infectious spores are released from infected hosts 
(with rate #), and can increase in abundance through saprophytic 
growth, with rate o. To illustrate the effects of saprophytic pathogen 
growth on host and pathogen equilibria (Fig. 2c), density-independent 
host reproduction (with rate 6), density-dependent host mortality (with 
rate do + d,N, where N= S + J), and density-dependent spore 
mortalities (at rate fo + 1Z) were included: dS/dt = bN — (do + diN)S 
— BSZ; di/dt = BSZ — al — (dg + di N)I; dZ/dt = $1 + oZ — (uo + Z) 
Z— BNZ. 

The presence of a tolerant host species can lead to the extinction of a 
susceptible host species. In this model, species A is the tolerant host 
species, which can become infected and shed infectious spores but 
does not die as a result of the disease, whereas the susceptible host 
species (Species B) has a disease-induced per-capita mortality rate of 
op. Figure 2d shows that species B is driven extinct at high densities of 
species A. dSp/dt = baNa = (dao oP daiNa)Sa = BrSpZ; dip/ 
dt = BaSaZ — (dao + daiNa)la; dSp/dt = beNa — (dgot+dgiNe) 

Sp — BeSpZ; dip/dt = BgSpZ —(go + dei Nee — tele; dZ/ 

dt = dala + dple — uZ — BaNaZ — BaNpZ, where all parameters are as 
previously defined, but with the subscripts A or B referring to host 
species A or B, respectively. 


B. dendrobatidis in amphibians, G. destructans in bats and Ophiostoma 
ulmi in elm trees). Virulence is a measure of the relative capacity of a 
microbe to cause damage to a host”*, and high virulence is associated with 
rapid intra-host growth rates, ultimately leading to rapid inter-host trans- 
mission*’**. Fungi have a high reproductive potential and in a large host 
population this effect can result in all individuals becoming infected 
before the population is driven to the low densities at which the pathogen 
can no longer spread (Fig. 2a). Thus, host extirpation can occur before 
density dependence limits the rate of transmission, a feature that has 
contributed to the mass extirpations seen in frog populations across 
the US Sierra Nevada mountains”. Similarly, even if the pathogen does 
not drive the host to complete extinction, it may severely reduce the 
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population size to the point at which the species is vulnerable owing to 
catastrophic collapses as a result of stochastic” or Allee effects*®. 


Long-lived environmental stages 

Fungi have remarkably resilient dispersal stages (a feature that they share 
with some spore-forming bacteria, such as Bacillus anthracis). The ability 
to survive independently outside of their host, as a free-living saprophyte 
or as durable spores in the environment, is probably the most important 
feature in driving the emergence of pathogenic fungi, owing to an increased 
risk of transporting the inocula to naive hosts (Fig. 2b)*'. Furthermore, 
pathogenic fungi with a saprophytic stage (called sapronoses; Fig. 2c) can 
lead to host extirpation because their growth rate is decoupled from host 
densities and many fungal diseases threatening natural populations are 
caused by opportunistic fungi with long-lived environmental stages. Many 
fungi in the phylum Ascomycota are common soil organisms and are 
tolerant of salinity with the consequence that, when they enter the marine 
system through freshwater drainage, they are able to infect susceptible 
hosts such as corals (A. sydowii*’), sea otters (Coccidioides immitis*) and 
the nests of loggerhead turtles (Fusarium solani*). In terrestrial environ- 
ments, potentially lethal fungi are ubiquitous, such as the causative agent of 
aspergillosis, Aspergillus fumigatus, and soil surveys have shown that 
Geomyces spp. are common soil organisms. Viable G. destructans has been 
recovered from the soil of infected bat caves*’, showing that the pathogen is 
able to survive and persist in infected roosts when the bats are absent. 
Likewise, long-term persistence of fungal inoculum in the agricultural 
landscape is achieved by quiescent survival on plant debris, such as the 
spores of wheat stem rust (Puccinia graminis), which overwinter on straw 
stubble before infecting a secondary host. 


Generalist pathogens and opportunistic pathogens 

Although many fungi demonstrate extreme host specialization, exem- 
plified by the gene-for-gene interactions between biotrophic fungi and 
their plant hosts, broad host ranges twinned with high virulence can bea 
lethal combination. Fungi exhibit the broadest spectrum of host ranges 
for any group of pathogens, and B. dendrobatidis and the oomycete 
Phytophthora ramorum (the cause of sudden oak death and ramorum 
blight) are known to infect 508 (ref. 16) and 109 (ref. 3) host species, 
respectively. Different host species vary in their susceptibility to infec- 
tion and these differences create the potential for parasite-mediated 
competition when the pathogens concerned are generalists**. Host 
species that can tolerate high infection loads while serving as a source 
of infectious stages (known as pathogen spill-over) act as community 
‘super spreaders’ by maintaining persistent infectious stages in the 
system (Fig. 2d). Invasive North American signal crayfish, which tol- 
erate infection by the oomycete A. astaci, force the infection into more 
susceptible European species that then decline”, and similarly, although 
P. ramorum is deadly to Notholithocarpus densiflorus (tanoak) and 
several Quercus species’, many of its other hosts survive infection but 
generate inoculum themselves for new infections. Furthermore, disease- 
tolerant life-history stages of otherwise susceptible species can maintain 
high pathogen levels leading to extinction dynamics. In chytridiomycosis, 
the long-lived multi-year tadpole stages of amphibians such as the 
mountain yellow-legged frog Rana muscosa and the midwife toad 
Alytes obstetricans are not killed by chytrid infections, but they can 
build up high loads of B. dendrobatidis that can infect and overwhelm 
juvenile metamorphs of the same species, leading to rapid population 
loss*’. Ultimately, when host-generalist pathogens manifest long-lived 
environmental stages, conditions may occur that lead to long-distance 
dispersal and infection of naive hosts and environments’. 


Trade and transport promotes globalization of fungi 

Fungi comprise most of the viable biomass in the air, with an average 
human breath containing between one and ten fungal spores**. This 
ability of fungi to disperse results in some species with cosmopolitan 
distributions’””°. However, these species are in the minority and it is 
noticeable that few fungi exhibit truly globally distributions; instead they 
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Figure 2 | Fungal disease dynamics leading to host extinction. a, The 
presence of a threshold host population size for disease persistence does not 
prevent host extinction during a disease outbreak, especially in cases in which a 
lethal pathogen invades a large host population. In a large host population 
transmission is rapid and all hosts can become infected before the host 
population is suppressed below the threshold (pathogen transmission rate, 

f = 0.001 per individual per day; disease induced-death rate, « = 0.02 per day; 
simulations start with one infected individual and No susceptible individuals). 
b, Long-lived infectious stages can increase the potential for host extinction. 
The fraction of hosts killed in a disease outbreak is shown as a function of the 
duration of the free-living infectious spore stage (pathogen transmission rate, 
p=5xX 10 °; disease-induced death rate, « = 0.02; rate of release of spores 
from infected hosts, ¢ = 10; outbreaks initiated with one infected host in a 
population of No susceptible individuals). c, Saprophytic growth: equilibrium 
densities of susceptible and infected hosts and free-living spores as a function of 
the rate of saprophytic growth, o. With no (or low levels of) saprophytic growth, 
the basic reproductive rate of the pathogen (Ro) is less than 1, the pathogen 
cannot invade the system and the host persists at its disease-free equilibrium 
density. Intermediate levels of saprophytic growth allow the pathogen to invade 


exhibit spatially restricted endemic ranges”’. In many cases, local adapta- 
tion and host specificity are thought to underlie fungal endemicity""*”. 
Nevertheless, when local climatic and vegetative constraints are projected 
globally it becomes clear that potential ranges of pathogenic fungi may be 
much larger than their realized range”. If fungi are contained spatially by 
the combination of physical limits on dispersal, abiotic conditions, host 
distributions and genetic limits on adaptation, then how are pathogenic 
fungi able to overcome these barriers? Although fungi have shown the 
ability to undergo range expansions in response to environmental 
shifts’, human-mediated intercontinental dispersal of unrecognized 
fungal pathogens is the major component in initiating new chains of 
transmission. 

Pathogenic fungi have dispersed alongside early human migrations, 
and several thousand years ago two of these fungi, Coccidioides immitis 
and C. neoformans lineage VNI, seem to have invaded South America and 
southeast Asia, respectively, vectored by humans and their domesticated 
animals**°°. Similar ancient patterns of human-associated disease spread 
are detected by studies of the genome diversity of many plant fungal 
pathogens**. However, more recent increases in fungal disease are 
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and persist with the host. High levels of saprophytic growth lead to extinction of 
the host, and the pathogen persists in the absence of the host (host intrinsic rate 
of increase, r = b — dy = 0.01; density-independent host death rate, 

do =1X 10>; strength of density dependence in host death rate, d; = 1 X 

10 +; pathogen transmission rate, / = 1 X 10 °; disease-induced death rate, 
a = 0.02; rate of release of spores from infected hosts, ¢ = 10, density- 
independent spore mortality rate, io = 1; strength of density-dependence in 
spore mortality rate, 4, = 1 X 10°“). d, The presence of a tolerant host species 
(host species A), which can become infected and shed infectious spores can lead 
to the extinction of a susceptible host species (host species B). Species A does 
not die because of the disease, but species B has a disease-induced per-capita 
mortality rate of ~. Species B is driven extinct at high densities of species A. For 
all parameters, subscripts A or B indicate the host species. Host intrinsic rates of 
increase, r, = rg = 0.01; density-independent host death rates, 

dao = dgo = 1x10 *; host birth rates, b, = bg = ra + dao; density- 
independent death rate for species, B dg, = 1x 10°; rate of release of spores 
from infected hosts, 64 = dg = 10; %p = 0.05; spore mortality rate, 4. = 1. The 
density of tolerant species Na was varied by varying da, (the strength of 
density-dependence in host species A), such that Na = ra/day. 


attributable to the many-fold increase in fungal-infected trade products 
and food’’. The consequences of recent introductions of pathogens in 
association with trade are well known; examples include the Irish 
Famine’ (a consequence of Phytophthora infestans late blight introduc- 
tion from South America), the destruction of the North American chest- 
nuts” (caused by the importation of Cryphonectria parasitica-infected 
Asian chestnut trees to the east coast of the United States in the early 
twentieth century) and the Second World War introduction of 
Heterobasidion annosum into Italy from the USA (vectored by untreated 
wooden transport crates)’. Human-mediated intercontinental trade has 
also been linked clearly to the spread of animal-pathogenic fungi through 
the transportation of infected vector species. B. dendrobatidis has been 
introduced repeatedly to naive populations worldwide as a consequence 
of the trade in the infected, yet disease-tolerant species such as North 
American bullfrogs (Rana catesbeiana)®®° and African clawed frogs 
(Xenopus laevis)***. Whether the emergence of bat WNS constitutes an 
introduction of G. destructans into North America from Europe or 
elsewhere remains to be shown. However, the widespread but apparently 
non-pathogenic nature of the infection in European bats tentatively 
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suggests that the disease may have been vectored from this region in 
contaminated soil®. 


Accelerated evolution of virulence in pathogenic fungi 
Human activities are not only associated with the dispersal of patho- 
genic fungi, they also interact with key fungal characteristics, such as 
habitat flexibility, environmental persistence and multiple reproductive 
modes, to cause the emergence of disease. Importantly, many fungi are 
flexible in their ability to undergo genetic recombination, hybridization 
or horizontal gene transfer®, causing the clonal emergence of patho- 
genic lineages but also allowing the formation of novel genetic diversity 
leading to the genesis of new pathogens” *”. Reproductive barriers in 
fungi are known to evolve more rapidly between sympatric lineages that 
are in the nascent stages of divergence than between geographically 
separated allopatric lineages, in a process known as reinforcement®”. 
As a consequence, anthropogenic mixing of previously allopatric fungal 
lineages that still retain the potential for genetic exchange can drive 
rapid macroevolutionary change. Although many hybrids are inviable 
owing to genome incompatibilities, large phenotypic leaps can be 
achieved by the resulting ‘hopeful monsters’, leading to host jumps 
and increased virulence”’. Such mechanisms are thought to drive the 
formation of new pathotypes in plant pathogens”’, and oomycetes as 
well as fungi exhibit the genesis of new interspecific hybrids as lineages 
come into contact’”'””. Evidence of the effect of multiple fungal co- 
dispersal events and recombination can also be seen in the recent 
C. gattii outbreaks in northwestern North America. In this case, strains 
that do not normally recombine have increased their virulence by 
undergoing recombination and adaptation to overcome mammalian 
immune responses’**’. Recent studies based on the resequencing of 
B. dendrobatidis genomes have shown that, although several lineages 
exist, only a single lineage (known as the B. dendrobatidis global 
panzootic lineage) has emerged in at least five continents during the 
twentieth century to cause epizootic amphibian declines”. Notably, the 
genome of the B. dendrobatidis global panzootic lineage shows the 
hallmarks of a single hybrid origin and, when compared against other 
newly discovered lineages of B. dendrobatidis, is more pathogenic, 
suggesting that transmission and onward spread of the lineage has been 
facilitated by an increase in its virulence. Given that the rate of intra- and 
inter-lineage recombination among fungi will be proportional to the 
contact rates between previously geographically separate populations 
and species, these data from across plant and animal fungal patho- 
systems suggest that the further evolution of new races is inevitable given 
current rates of homogenization of previously allopatric, geographically 
separated, fungal lineages. 


Environmental change as a driver of fungal EIDs 

Climate fluctuation can be a potent cofactor in forcing changing 
patterns of fungal phenology” and are known to govern plant fungal 
EIDs. Models of climate change for the coming decades predict increases 
in global temperature, atmospheric CO», ozone and changes in humidity, 
rainfall and severe weather”. For this reason, many interactions must be 
taken into consideration when attempting to predict the future effects of 
climate change on plant diseases”. First, the physiological and spatial 
changes that plants may undergo in response to the various different 
components of climate change and the resulting effects on the patho- 
gen’®, and second, the effects on the pathogen’s physiology and dispersal 
external to their plant hosts”. Frequently, however, experimental models 
have only taken into account one element of climate change, a common 
example being the free-air CO) enrichment (FACE) studies that model 
the effects of elevated atmospheric CO (ref. 77). A notable result here has 
been rice blast severity being higher at higher CO) levels”. However, 
although there has been a general trend for increased disease severity 
under simulated climate-change conditions”, and although some species 
are thought to be changing their distribution in response to these changes 
(for example, P. graminis*°), other elements of climate change, such as 
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increased ozone, have been shown to have the opposite effect (for 
example, in Puccinia recondita’’). 

Evidence for the idea that climate change has an impact on the 
dynamics and distribution of animal-infecting fungi is less clear-cut than 
that in relation to plant-infecting fungi and, although arguments have 
been made that warming trends may have contributed to the emergence 
of B. dendrobatidis in Central America and Europe*"”, there is active 
debate about these conclusions****. Regardless, it is clear that the disease 
state, chytridiomycosis, is linked to environmental factors; regional 
climate warming can increase the local range of the pathogen™ and 
disease risk is inversely related to rates of deforestation®’. Correlations 
between ecosystem change anda rise in infection by opportunistic patho- 
gens has been proposed to account for the occurrence of coral reef 
declines worldwide. For example, disease caused by a variety of microbes 
threatens hard corals to the extent that two of the most abundant 
Caribbean reef-builders (staghorn and elkhorn corals) are now listed 
under the US Endangered Species Act. Across varied reef systems, the 
occurrence of warm-temperature anomalies leading to bleaching events 
is associated with increases in disease caused by opportunistic pathogens 
such as A. sydowii*®. In an allied colonial system, colony collapse disorder 
has resulted in steep declines of the European honeybee Apis mellifera in 
Europe and North America®’. These losses seem to be influenced by a 
mixture of aetiological agents that are fungal (for example, micro- 
sporidian (Nosema ceranae)), viral (for example, Kashmir bee virus 
and Israeli acute paralysis virus) and ectoparasitic (for example, Varroa 
destructor) in origin. So far, no single environmental cause has been 
identified that can account for the apparent reduction in the ability of 
honeybee colonies to resist these infections, and agricultural chemicals, 
malnutrition and modern beekeeping practices have all been suggested as 
potential cofactors for colony-collapse disorder**. The increasing use of 
azole-based agricultural chemicals has been implicated as a factor under- 
pinning the increase in the frequency of multiple-triazole-resistant 
(MTR) isolates of A. fumigatus infecting humans”. The widespread agri- 
cultural use of azoles as a means of combating crop pathogens is specu- 
lated to have led to selection for MTR alleles, an idea that is supported by 
the recent discovery that resistance clusters onto a single lineage in Dutch 
populations of the fungus”. Efforts must now be turned to integrating 
epidemiological studies with those on environmental change so that the 
many possible interactions and outcomes can be assessed, as making 
blanket predictions for fungal diseases is currently impossible’’. The 
highly coordinated response to the recent outbreak of wheat stem rust 
(P. graminis, strain Ug99) is a positive step towards this goal’””’. 


Fungal EIDs impact food security and ecosystem services 


Impacts of fungal diseases are clearly manifested in crops and there are 
direct measurable economic consequences associated with die-off in 
forest and urban environments. Losses that are due to persistent and 
epidemic outbreaks of fungal and oomycete infection in rice (rice blast 
caused by Magnaporthe oryzae), wheat (rust caused by P. graminis), 
maize (smut caused by Ustilago maydis), potatoes (late blight caused 
by P. infestans) and soybean (rust caused by Phakospora pachyrizi) vary 
regionally but pose a current and growing threat to food security”. Our 
estimates of loss of food are based on the 2009-10 world harvest stat- 
istics of five of our most important crops and make certain basic 
assumptions of calorific value and worldwide average production 
(Supplementary Table 1). Our calculations show that even low-level 
persistent disease leads to losses that, if mitigated, would be sufficient 
to feed 8.5% of the 7 billion humans alive in 2011. If severe epidemics in 
all five crops were to occur simultaneously, this would leave food suf- 
ficient for only 39% of the world’s population, but the probability of such 
an event occurring is very low indeed. 

Invasive tree diseases have caused the loss of approximately 100 million 
elm trees in the United Kingdom and the United States”’”’, and 3.5 billion 
chestnut trees have succumbed to chestnut blight in the United States 
(Supplementary Table 5). Losses of western Canadian pine trees to the 
mountain pine beetle-blue-stain fungus association will result in the 
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release of 270 megatonnes of CO over the period from 2000 to 2020, with 
a clearly ascribed economic cost both for the wood itself and the carbon 
released™*. These, and other diseases such as ‘sudden oak death’ in 
California and ‘foliar and twig blight’ and “dieback’ on ornamental trees, 
woody shrubs and forestry plants in the European Union, affect eco- 
logical diversity, are costly to manage and account for huge losses of fixed 
CO,. Indeed, we calculate regional losses of absorbed CO, to total 230- 
580 megatonnes for just a handful of diseases (Supplementary Table 5) 
with the higher figure equating to 0.069% of the global atmospheric CO). 
We have included both emerging (Jarrah dieback, sudden oak death and 
pine beetle-blue-stain fungus) and emergent diseases (Dutch elm blight 
and chestnut blight), as these represent the few examples for which 
informed estimates are possible. We are unable to quantify any of the 
many other recent emerging diseases, such as red band needle blight of 
pines, Phytophthora alni on alders or pitch pine canker on Monterey 
pines, owing to a lack of data and economic interest, both of which are 
trends that must be reversed. Assessing the economic burden of fungal 
mycoses in animals is a challenging task. Although the impact of fungal 
EIDs is manifested in domestic animal settings, particularly the amphibian 
trade” and in regions where virulent lineages have established”, reporting 
mechanisms for outbreaks do not widely exist. In natural settings, valua- 
tions have recently estimated the losses to US agriculture that are the result 
of declines in bat populations at more than US$3.7 billion per year (ref. 12). 
However, although broad ecosystem-level impacts of other fungal EIDs of 
wildlife are suspected, economic valuations of the ecosystem services that 
these species support are wholly lacking. 


Mitigating fungal EIDs in animals and plants 

The high socioeconomic value of crops means that detection and control 
of fungal diseases in agriculture far outpaces that in natural habitats. 
Epidemiological models have been developed to predict the risk of 
seasonally specific crop pathogens, allowing targeted control, and spe- 
cific threats are assessed through consortia of research, governmental 
and global non-governmental organizations, led by the United Nations 
Food and Agricultural Organization (FAO), and related organizations. 
Scientifically led development of disease-resistant crop varieties has been 
mainly successful, although monocultures have in some instances vastly 
increased the susceptibility of harvests to highly virulent pathogens, a 
pertinent example being P. graminis Ug99. Conversely, although there 
have been some attempts to mitigate the fungal disease burden in wildlife 
in situ—most notably efforts to eliminate B. dendrobatidis in infected 
populations with the antifungal itraconazole” and the use of probiotic 
bacteria**—communicable wildlife EIDs are essentially unstoppable 
once they have emerged. International biosecurity against the spread 
of plant fungal pathogens, although not perfect, is more advanced than 
protocols to protect against the introduction of animal-associated fungi. 
Fundamentally, this is the result of a financial dynamic: wildlife is not 
correctly valued economically, whereas crops are. 

The World Organisation for Animal Health (also known as the OIE) 
and the FAO may be the best-placed authorities to coordinate tighter 
biosecurity controls for trade-associated fungal pathogens of animals. 
The OIE has listed B. dendrobatidis and the crayfish pathogen A. astaci 
in the Aquatic Animal Health Code as internationally notifiable 
infections, and the FAO compiles outbreak data on transboundary 
animal diseases using the emergency prevention information system 
(EMPRES-i). Similarly, the IUCN Wildlife Health Specialist Group 
determines policy that is specific to combating emerging wildlife disease 
internationally. On national scales there are a number of initiatives 
being deployed and in the United States the National Wildlife Health 
Centre has developed the national federal plan” to mitigate WNS in 
bats. Intensive monitoring and surveillance will be increasingly import- 
ant in the coming years because predictive modelling and small-scale 
experiments can never fully predict future disease spread and severity. 
An increased political and public profile for the effects of fungal diseases 
in natural habitats is needed to highlight the importance of fungal 
disease control outside of the managed agricultural environment to 
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policy makers. If this occurs, then there will be more sympathy for 
attempts to improve the regulatory frameworks that are associated with 
biosecurity in international trade, as this is the most important tool to 
tackle both plant and animal fungal EIDs now and in the future. The 
monitoring of fungal inocula in wild populations should be the utmost 
priority and tighter control of international trade in biological material 
must be imposed, and with considerable haste. Inadequate biosecurity 
will mean that new fungal EIDs and virulent races will emerge at an 
increasingly destructive rate. In addition to better global monitoring and 
control, attention must also be turned to increasing our understanding 
of the interactions between hosts, pathogens and the environment, 
across regional and global scales. Integrated approaches encompassing 
theoretical and practical epidemiology, climate forecasting, genomic 
surveillance and monitoring molecular evolution are needed. These 
should be facilitated by scientists from currently disparate research fields 
entering into regular global discussions to develop clear and urgent 
strategies for working towards the elusive magic bullet for emerging 
fungal diseases: effective prevention and timely control. 
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An elementary quantum network of 
single atoms in optical cavities 


Stephan Ritter’, Christian Nélleke’, Carolin Hahn!, Andreas Reiserer!, Andreas Neuzner', Manuel Uphoff!, Martin Miicke’, 


Eden Figueroa’, Joerg Bochmann'} & Gerhard Rempe! 


Quantum networks are distributed quantum many-body systems with tailored topology and controlled information 
exchange. They are the backbone of distributed quantum computing architectures and quantum communication. Here 
we present a prototype of such a quantum network based on single atoms embedded in optical cavities. We show that 
atom-cavity systems form universal nodes capable of sending, receiving, storing and releasing photonic quantum 
information. Quantum connectivity between nodes is achieved in the conceptually most fundamental way—by the 
coherent exchange of a single photon. We demonstrate the faithful transfer of an atomic quantum state and the 
creation of entanglement between two identical nodes in separate laboratories. The non-local state that is created is 
manipulated by local quantum bit (qubit) rotation. This efficient cavity-based approach to quantum networking is 
particularly promising because it offers a clear perspective for scalability, thus paving the way towards large-scale 


quantum networks and their applications. 


Connecting individual quantum systems via quantum channels 
creates a quantum network with properties profoundly different from 
any classical network. First, the accessible state space increases expo- 
nentially with the number of constituents. Second, the distribution of 
quantum states across the whole network leads to non-local correla- 
tions. Further, the quantum channels mediate long-range or even 
infinite-range interactions, which can be switched on and off at will. 
This makes quantum networks tailor-made quantum many-body 
systems with adjustable degrees of connectivity and arbitrary topol- 
ogies, and thus powerful quantum simulators. Open questions like the 
scaling behaviour, percolation of entanglement’, multi-partite entan- 
glement*’ and quantum phase transitions* ° make quantum networks 
a prime theme of current theoretical and experimental research. 
Similarly, quantum networks form the basis of quantum commun- 
ication and distributed quantum information processing architec- 
tures, with interactions taking the form of quantum logic gates’"””. 

The physical implementation of quantum networks requires suit- 
able channels and nodes. Photonic channels are well-advanced trans- 
mitters of quantum information. Optical photons can carry quantum 
information over long distances with almost negligible decoherence 
and are compatible with existing telecommunication fibre techno- 
logy. The versatility of quantum networks, however, is largely defined 
by the capability of the network nodes. Dedicated tasks like quantum 
key distribution can already be achieved using send-only emitter 
nodes and receive-only detector nodes'’. However, in order to fully 
exploit the capabilities of quantum networks, functional network 
nodes are required which are able to send, receive and store quantum 
information reversibly and efficiently. 

The implementation and connection of quantum nodes is a major 
challenge, and different approaches are currently being pursued. An 
intensely studied example is an ensemble of gas-phase atoms'*""*, but 
the protocols for the generation of single excitations are inherently 
probabilistic’. Another strong contender is a single particle’, which 
allows for single-photon emission’*®, quantum gate operations’”-” and 
scalability. But single emitters generally exhibit weak light-matter 


interaction resulting, again, in inherently probabilistic information 
exchange and very low success rates. In particular, the reversible 
quantum state mapping between a photonic channel anda single emitter 
in free space is highly inefficient. In their seminal work’, Cirac and co- 
workers therefore proposed to overcome these problems by network 
nodes based on single emitters embedded in optical cavities. 

Here we present the experimental realization of this prototype of a 
quantum network. The nodes are formed by single atoms quasi- 
permanently trapped in optical cavities. The cavity-enhanced light- 
matter interaction opens up a deterministic path for interconversion 
of photonic and atomic quantum states. By dynamic control of coher- 
ent dark states**!, single photons are reversibly exchanged between 
distant network nodes. We demonstrate faithful quantum state trans- 
fer across a network channel and create entanglement between distant 
nodes. High fidelities and long coherence times are achieved by 
encoding the quantum information in the polarization of the photon 
and the atomic ground-state spin. Our results present a direct photo- 
nic link between two distant single emitters and pave the way for the 
realization of large-scale quantum networks. 

In the following, we describe experiments in which we characterize 
a single network node and the connection of two nodes forming an 
elementary network. The two nodes are operated in independent 
laboratories at a distance of 21m and are connected by an optical 
fibre link of 60 m length. In a first experiment, we demonstrate that 
single photons can be stored in and retrieved from a single-atom node 
preserving the photonic polarization state. Second, we show the 
faithful transfer of arbitrary atomic quantum states from one node 
to the other. Third, a maximally entangled state of the distant atoms is 
created with a fidelity of up to 98% and is maintained for at least 
100 us. This coherence time exceeds the entanglement distribution 
time across the network link by two orders of magnitude, and trans- 
lates into a maximum possible entangled node distance of a 20km 
optical fibre path. Last, local unitary operations are performed on one 
of the nodes, resulting in rotations of the non-local bipartite state 
whereby different maximally entangled states are created. 


1Max-Planck-Institut fiir Quantenoptik, Hans-Kopfermann-Strasse 1, 85748 Garching, Germany. +Present address: Department of Physics, and California NanoSystems Institute, University of California, 


Santa Barbara, California 93106, USA. 
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A universal quantum network node 


Each node in our quantum network consists of a single neutral 
rubidium atom that is quasi-permanently trapped in a high-finesse 
optical resonator**”* (see Methods). The physical parameters put our 
atom-cavity systems in the intermediate-coupling regime of cavity 
QED, with the resonators optimized for highly directional optical 
output into a single mode (=90%) that is efficiently coupled (up to 
90%) to the fibre link (Fig. 1). The reversible conversion between 
quantum states of light and matter is enabled by the dynamic control 
of coherent dark states. The 57S,/. hyperfine ground states F = 1 and 
F = 2 of the single *’Rb atom are coupled in a Raman configuration 
formed by the vacuum mode of the cavity (vacuum Rabi frequency 2g) 
and an external control laser field (Rabi frequency Q-) (Fig. 1). 
Control laser and cavity are both blue-detuned by several tens of 
megahertz from the transition to the excited 5°P3), F’ =1 state 
which—for an ideal coherent dark state—is never populated. The 
atom is driven coherently between the two ground states by control- 
ling 2,(t), where t is time. With the atom initially prepared in |F = 2, 
mp = 0) and Q-(0) = 0, a single photon is generated in the cavity 
mode by increasing the control laser power until Qc > 2g. Owing to 
the finite resonator decay time, the single photon is immediately 
emitted into the output mode with its temporal wave-packet shape 
determined by Qc(t) (Fig. 2)”°. The single-photon character is con- 
firmed by a measurement of the second-order correlation function 
g(t) (Fig. 2e). The efficiency of the emission process is up to 60%. 
The inverse process of coherent absorption of a single photon is 
performed as a time-reversal of the emission process, and requires a 
decrease in 2_(t) timed to occur when a single photon arrives. 

We produce single photons with a nearly time-symmetric envelope 
in one atom-cavity system (Fig. 2a) and coherently absorb them in the 
second system*. After a selectable storage time, the photon is re- 
emitted (Fig. 2b). The overall write-read success rate is typically 
(10 + 1)% (all quoted errors are statistical). With a photon produc- 
tion efficiency of 60%, the storage efficiency is calculated to be 17%. 
The maximum efficiencies for photon absorption and emission are set 
by the atom-cavity coupling strength g and can approach unity for 
smaller cavity mode volumes and vanishing scattering losses at the 
cavity mirrors. 

For encoding one qubit, we utilize the photonic polarization degree 
of freedom and the atomic ground-state spin, that is, the Zeeman state 
manifold in each hyperfine state (Fig. 2c and d). When applying a 


m-polarized control laser field, the selection rules for electromagnetic 
dipole transitions ensure the faithful mapping of the polarization state 
of an incoming photon onto a well-defined superposition of atomic 
Zeeman states and vice versa for the re-emitted photon. To characterize 
the conversion process, we set the polarization of the incoming photon 
and compare it to that of the retrieved photon after storage in 
the atom™. The reconstructed Poincaré sphere after a storage time 
of 2.5 1s is shown in Fig. 2f. The average fidelity of the quantum 
memory, defined as the average overlap with the input state, is 
Fam = (92.2 + 0.4)% for photons arriving ina 1 ps time interval (start- 
ing at t = 0.2 us in the graph in Fig. 2b, see Discussion section below). 
The measured fidelity is far above the classical limit’ of 2/3. This 
experiment is the first to prove coherent transfer of a qubit encoded 
in a single photon onto a single atom. The coherence time of our 
memory has been characterized earlier and exceeds 100 ls (ref. 24). 


Quantum state transfer between single atoms 

In the next experiment, we transfer quantum information from node 
A to the distant node B by sending a single photon across the fibre 
link. The qubit to be transferred is the state of the atom in node A 
represented by 


Wa) 1)+|F 4-1) 


where « and f are normalized complex-valued amplitudes. We apply 
a m-polarized control laser pulse to the atom at node A, thereby 
generating a photon in the polarization state: 


|v :hoton) =2L) + BIR) 


Here |L) and |R) refer to the left and right circular polarization com- 
ponents of the photon. After emission of the photon, the atom at node 
A is left in the state |F = 1, mp = 0) (Fig. 3a, left diagram and ref. 21). 
The photon is transmitted through the optical fibre to node B. The 
atom at node B is initially prepared in the state |F = 1, mp = 0). Using 
the Raman scheme (Fig. 3b, right diagram) the incoming photon is 
absorbed and its polarization state is mapped onto the atomic state of 
node B, which becomes: 


|Wp) =o|F =2,mp = 


ot| F 1,mp 1, mp 


1)+ B|F=2,mp = +1) 


After absorption, an arbitrary quantum state has been successfully 
communicated from node A to node B. The qubit encoded in atom B 


% aT 


Node A 


Figure 1 | A cavity-based quantum network. In the envisaged architecture 
(top), many single-atom nodes are connected by single-photon links. Here we 
explore the universal properties of the system produced by connecting two 
nodes (middle; A and B) within this configuration. Details of the nodes and 
connections are shown in the lower part of the figure. In our experiment, these 
two identical nodes are located in independent laboratories connected by a 60- 
m optical fibre (1). Each node consists of a single rubidium atom (2) trapped in 
an optical dipole trap at the centre of a high-finesse optical cavity (3). Quantum 
state transfer between the atoms and remote entanglement can be achieved via 
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Node B 


exchange of a single photon (4), with the quantum information encoded in the 
internal state of the atom and the polarization of the photon. Both the 
production of a photon (node A) and its storage (node B) are achieved via a 
coherent and reversible stimulated Raman adiabatic passage (see main text for 
details; (5), control laser). Also shown for each node is the atomic level scheme, 
with the green and red arrows indicating the control laser and the exchanged 
single photon, respectively. 4 is the one-photon detuning. Insets, fluorescence 
images of the two single atoms. 
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Figure 2 | Universal quantum network node. a, Single photons with the 
indicated temporal envelope (full-width at half-maximum 0.5 1s) are produced 
from node A and subsequently stored in node B. b, Photons are retrieved after a 
storage time of 2.5 1s. c, d, Atomic level schemes and optical transitions of 87Rb 
used in the protocol. c, The atom at node B is initially prepared in the state 
|F = 1, mp = 0) (grey sphere). The polarization of the incoming photon is a 
superposition of ¢" ando@ components (yellow and red arrows) and is 
converted into a superposition of the |F = 2, mp = +1) states via a stimulated 
Raman adiabatic passage using a 1-polarized control laser (green arrows). In 
the process, the phase relation between the o~ polarization components is 
mapped to a relative phase of the atomic Zeeman states. d, The photon is 
recreated by reversing the storage process, thereby mapping the atomic 
superposition state (red and yellow sphere) onto the photon’s polarization. 

e, Incident photon correlations. The measured second-order correlation 
function g(z) confirms that single photons are produced by node A. 

f, Quantum tomography of the storage process. The minimal deformation of 
the unit Poincaré sphere proves that every initial photonic quantum state is well 
preserved during storage and retrieval. The fidelity averaged over all input 
states is Fyn, = (92.2 + 0.4)%. |H), V), |L), R), |D) and |A) denote horizontal, 
vertical, left-circular, right-circular, diagonal and anti-diagonal polarization, 
respectively. All quoted errors are statistical. 


is identical to the original one in node A, albeit encoded in the F = 2 
hyperfine manifold. After this quantum state transfer, node A is now 
ready to receive another photonic qubit, whereas node B is capable of 
resending the stored qubit at any time. It is this symmetric and revers- 
ible feature that makes this scheme scalable to arbitrary network 
configurations of multiple atom-cavity systems. 

We analyse the quantum state transfer using quantum process 
tomography”. For this purpose, we prepare the atom at node A ina 
state of the form |i/,) by a projective measurement (see Methods). 
After quantum state transfer from node A to node B as described 
above, we read out the state of the atom in node B by mapping it onto 
the polarization of a second single photon which is then detected. By 
comparing a sufficient set of initial quantum states in node A with the 
obtained states in node B, we can infer the outcome of the protocol for 
any initial state of node A. The process matrix y describes the map- 
ping of the density matrix p, of the state at node A (taken to be the 
ideally prepared state |i/,)) onto the transferred state pg at node B 


ARTICLE 


through the operation px = Y>>, po Ymn mPa). Here, oo is the iden- 
tity matrix and the other three o; are the usual Pauli matrices. For 
calculating y we normalize the density matrices. 

Ideally, the two density matrices are identical, which is equivalent 
to having Yoo = 1 and all other elements zero. Figure 3c shows the 
absolute value of all elements of y obtained from a maximum 
likelihood fit to the experimental data. We find 799 = 0.76 as the 
dominating matrix element, indicating a high level of control over 
the quantum process. The main deviation from a perfect state transfer 
is a slight depolarization of the quantum state, as indicated by the 
non-vanishing diagonal elements 7,;, 722 and 733. The state transfer 
can also be characterized by a fidelity defined as the average overlap 
between initial and transferred state. We find F,,.,, = (84 = 1.0)%, 
which proves the quantum character of the state transfer, as the highest 
fidelity achievable with classical information exchange between the 
nodes is 2/3. The overall success rate of the state transfer protocol is 
0.2%, resulting from a production efficiency of the transmitter photon 
at node A of 3% (see Methods), propagation losses leading to a photon 
transmission of 34% and a storage efficiency at node B of about 20%. 


Remote atom-atom entanglement 


The most remarkable property of a quantum network is the existence 
of entangled quantum states shared among several network nodes. 
This is a basis for quantum logic gate operations between nodes and 
can lead to complex quantum many-body phenomena. 

In the following, we demonstrate the creation of remote entangle- 
ment between distant single-atom nodes based on the transmission of 
a single photon. We first prepare the atom at node A in the state 
|F = 2, mp = 0) (Fig. 4). Applying a 1-polarized control laser pulse 
triggers the emission of a single photon and creates the maximally 
entangled state 


\Waephoon) = 5 (Il. 1)@1R) —[1o1) BIE)) 


between the spin state of the atom and the polarization of the photon”! 
that is routed to node B. There it is coherently absorbed and its 
polarization is mapped onto the spin state of atom B. The atom- 
photon entanglement is thus converted into entanglement between 
the two nodes, with the two atoms in the maximally entangled | ¥) 
Bell state: 


1 


Vien) = 7 


2,1) —|1,1) ®|2,—1)) 


We verify the presence of this entangled state by mapping the atomic 
state at each node onto a photon and analysing the polarization cor- 
relations among the two read-out photons. The real part of the result- 
ing density matrix, with the read-out performed 7 1s after the creation 
of atom-atom entanglement, is shown in Fig. 4c. We find a fidelity of 
Fy-) =(85+1.3)% with the | ¥) Bell state. This exceeds the classical 
limit of 50%, clearly proving the existence of entanglement between 
the two remote atoms. Fidelities as high as (98.7 + 2.2)% can be 
achieved by further post-selection of photon detection events (see 
Methods). 

The success probability of entanglement creation is 2%. It is the 
product of the photon generation efficiency (40%) at node A, the 
probability with which the photon is delivered to node B (34%) and 
its storage efficiency at node B (14%). The verification process for the 
entanglement, consisting of the production of one photon at each of 
the two nodes and their subsequent detection, has an efficiency of 
0.16%. 

The entanglement created in this experiment exists between the 
spin states of two single atoms at a physical distance of 21 m. Highly 
non-classical correlations between the two atoms are observed for 
100 us. The fidelity with the |Y%) Bell state measured 100 us after 
creation of entanglement is (56 + 3)%, still exceeding the classical 
threshold of 50% by two standard deviations. The decoherence of 
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Figure 3 | Quantum state transfer between two single-atom network nodes. 
a, At node A (left), an arbitrary quantum state is encoded in the Zeeman state 
manifold of the single atom (see Methods). This quantum information is 

mapped onto the polarization ofa single photon which is sent to node B (right). 
b, The photonic polarization is mapped to a superposition of atomic Zeeman 


Figure 4 | Remote entanglement of two single-atom nodes. a, A single 
photon is generated at node A (left), such that the internal state of the atom and 
the polarization of the photon are entangled. The photon is sent to node B 
(right) where its polarization is mapped onto the atomic state. The grey spheres 
indicate the initial state of the atoms. b, This creates entanglement between 
nodes A and B that can be maintained for at least 100 ps. The atomic levels 
involved in the entangled state are marked with red and yellow spheres. c, For 
analysis, both atomic states are converted into single photons. Polarization 
tomography on the two photons confirms the entanglement between the two 
nodes. We measure a fidelity of Fyy-) = (85+ 1.3)% with respect to the |‘) 
Bell state. Shown is the real part of the density matrix. The magnitude of each 
imaginary part is $0.03. 
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states, thereby completing the quantum state transfer from node A to node B. 
c, Absolute value of the elements of the process matrix 7 for the quantum state 
transfer. The average fidelity between the ideal and the read-out transferred 
state is (84 + 1) %, well above the classical limit of 2/3. 


the atom-atom entangled state is dominated by dephasing caused by 
uncorrelated magnetic field fluctuations (of the order of 1 mG) at the 
two individual nodes and position-dependent differential a.c. Stark 
shifts induced by the dipole trap light fields. The dephasing due to 
local magnetic field fluctuations can be significantly reduced by apply- 
ing small magnetic guiding fields (30 mG) along the quantization axis 
of each node. This has been used for the measurements with 100 ps 
entanglement duration. The observed remote-entanglement lifetime 
exceeds the entanglement creation time (1 [Us for creation, transmis- 
sion and absorption of an entangling photon) by two orders of mag- 
nitude. Entanglement lifetimes of the order of seconds can be 
expected when mapping the Zeeman qubit to magnetic-field-insens- 
itive clock states using microwave or Raman pulses”. 

In the limit of unit efficiency, the entanglement scheme presented 
here allows for the deterministic creation of entanglement. In our 
experimental implementation, efficiencies are below one and we 
therefore detect a posteriori entanglement”. The detection of entang- 
led read-out photons indicates that atom-atom entanglement had 
been present. Only entanglement attempts that lead to the final detec- 
tion of two read-out photons in the mapping process are considered in 
our data. The creation of heralded entanglement*’”? is possible by 
implementing a mechanism that signals the successful storage of a 
transmitted photon at node B without disturbing the stored quantum 
state (see Discussion below). 


Local manipulation of a non-local state 


Nodes A and B are in separated physical locations and thus are inde- 
pendently addressable for local qubit control. When two nodes are 
entangled, unitary operations applied locally at one of the nodes 
change the non-local state of both nodes while the entanglement is 
preserved. Thus local qubit control allows arbitrary maximally 
entangled two-qubit states to be created using a single initial entang- 
led state as a resource. We demonstrate this capability by creating the 
|") Bell state. We start by preparing the two nodes in the | ¥%) Bell 
state as described above. Applying a magnetic field along the quant- 
ization axis only at node B causes a state rotation at twice the Larmor 
frequency. The fidelity of the created state with the |) and the | ¥*) 
Bell state is plotted as a function of the applied magnetic field in Fig. 5. 
The time between entanglement creation and read-out of the atomic 
state is fixed at 12.5 pis. As can be seen from Fig. 5, the rotation of the 
non-local state results in a sinusoidally varying overlap with the |'%~) 
Bell states. The fidelity with respect to the |Y%*) state reaches a 
maximum of (81 + 2)% at a magnetic field of B = 30 mG. The original 
|W") state is recovered with a fidelity of (76+ 2)% after a spin 
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Figure 5 | Controlled rotation of the entangled state. A magnetic field, locally 
applied at node B, changes the non-local quantum state of the entangled nodes. 
The atom at node A is held at zero magnetic field. The phase evolution for a fixed 
hold time of 12.5 1s is proportional to the applied magnetic field. Plotted are the 
fidelity with the | Y¥~) and the | ~) Bell state (red and blue data, respectively), 
showing a characteristic oscillation. The error bars indicate the statistical 
standard error. The solid line is a cosine fit to guide the eye. The initially 
prepared |) state (F\y-) =(79 + 1.6)%) can be rotated into a |¥*) state of 
comparable fidelity (F\y+) =(81+2)%) using a magnetic field of 30 mG. 


rotation of 2m. The reduced fidelity with the |) state at a field of 
60 mG is a result of non-negligible Larmor precession during the 
entanglement creation and read-out processes. 


Discussion 

The body of work presented here constitutes, to our knowledge, the 
first direct coupling of two distant single quantum emitters by 
exchange of a single photon. Our results introduce universal quantum 
network nodes based on single emitters. Single-atom—cavity nodes 
excel previously investigated light-matter interfaces and incorporate 
their specific advantages in one platform. The use of single emitters 
offers a clear perspective for heralding** °° and the integration of 
quantum gate operations both local and remote’*'”"!*”’. In the fol- 
lowing, we briefly discuss the potential of our specific implementation 
of universal quantum network nodes with respect to fidelity and 
efficiency of the described processes, storage time and scalability. 

In all our experiments, atomic state preparation errors due to non- 
optimal optical pumping reduce the fidelity. Control lasers that are 
not perfectly m-polarized and off-resonant excitations cause devia- 
tions from the ideal transition scheme (Figs 2-4). These errors may 
lead to emission of photons with excitation paths different from the 
Raman scheme. A detailed analysis of transition strengths and effec- 
tive Rabi frequencies shows that these excitation paths lead to delayed 
photon emission. The contribution of these photons to the measured 
signal can therefore be suppressed by post-selecting subsets of data based 
on photon arrival times (see Methods). As an example, entangled-state 
fidelities as high as (98.7 + 2.2)% are reached when considering only the 
first 50% of read-out photons from node A and the initial 14% of 
photons retrieved from node B. These results show the great potential 
for achieving very high fidelities if the mentioned imperfections are 
overcome. 

The efficiency achievable with the demonstrated deterministic 
entanglement scheme® is higher than what can be achieved with 
probabilistic schemes”'*. Although our first implementation is not 
deterministic and the entanglement is not heralded, our efficiencies 
exceed previous demonstrations of remote entanglement by several 
orders of magnitude*!*’. The main limitation in our experiments is 
the moderate atom-cavity coupling strength, g. Efficiencies approach- 
ing unity may be accomplished when the cavity mode volume is 
decreased** and mirror scattering losses are eliminated. Ultimately, 
the loss of photons in the optical fibre connecting two quantum nodes 
will limit the overall efficiency for long distances, underlining the 
importance of a heralding scheme. 
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Heralding is crucial in networks with non-unity success probability 
of the quantum link, and ensures scalability in more complex and 
long-distance quantum networks. For architectures based on single 
atom-cavity nodes this is conceptually straightforward. The successful 
storage process is associated with a change of the atomic hyperfine 
ground state. The depopulation of the initial state can then be probed 
spectroscopically, using, for example, cavity-assisted fluorescence 
hyperfine state detection or cavity transmission***°. As the splitting 
between the two hyperfine ground states in *’Rb is large, optical prob- 
ing of the F=1 population will leave the qubit stored in F=2 
unaltered. The main challenge is to detect F= 1 with high fidelity 
before off-resonant excitation leads to decay into F = 2, that is, with 
only a few scattered photons. Although the implementation of herald- 
ing is therefore challenging with our cavities, it is certainly feasible with 
current cavity technology”’. 

The very weak coupling of the nuclear spin of single atoms to the 
environment can be exploited to boost the coherence times of our 
network nodes by mapping the Zeeman qubit onto magnetic-field 
insensitive clock states. The number of qubits per node may be 
increased through the use of optical lattices and single-atom registers”*. 
After the preparation of remote atom-atom entanglement using a 
procedure akin to that shown here, the registers in different nodes 
could be shifted to successively produce many sets of entangled atoms 
which can then be used, for example, for nested entanglement puri- 
fication. This possibility, in combination with the long storage times 
achievable with single atoms and the potential for heralding, represents 
a realistic avenue towards quantum communication over arbitrary 
distances by means of a quantum repeater protocol”. 

Current advances in photonic technologies allow reconfigurable 
routeing between different nodes, thereby enabling various different 
network topologies. The controlled interaction between arbitrary 
nodes and the plethora of accessible topologies of cavity quantum 
networks are not only an important resource for quantum informa- 
tion processing; cavity networks also constitute a suitable paradigm 
for investigating emergent phenomena, such as quantum phase tran- 
sitions of light*® or percolation of entanglement’. 


METHODS SUMMARY 


The two independent quantum nodes are designed to operate with similar physical 
parameters. In each apparatus, a single *’Rb atom is quasi-permanently trapped 
inside an optical dipole trap (potential depth Up/kg = 3 and 5 mK, where kg is the 
Boltzmann constant) and held at the centre of a high-finesse optical cavity (finesse 
6 X 10, mirror distance 0.5mm, mode waist 30 um). Both cavities have asym- 
metric mirror transmissions of T; <6 p.p.m. and T> ~ 100 p.p.m., leading to a 
highly directional (T,., = 0.9) single output mode. Both systems produce photons 
on the Dj line of *’Rb at a wavelength of 780 nm. In this configuration, both atom- 
cavity systems operate in the intermediate-coupling regime of cavity QED 
(coherent atom-cavity coupling g=2nx5MHz, cavity field decay rate 
« = 2n X 3 MHz, atomic polarization decay rate y = 2m X 3 MHz). Whena single 
atom is trapped inside each of the cavities, the experimental protocol runs at a 
repetition rate of 5 kHz, including optical pumping (20 ls), photon generation 
(1 us), photon storage (1-100 jis) and optical cooling of atomic motion (80 j1s). 
The necessary laser beams impinge perpendicular to the cavity axis. The presence 
and position of single atoms is monitored in real time by an electron multiplying 
CCD camera, which collects atomic fluorescence light (Fig. 1 inset). In combina- 
tion with a longitudinally shiftable standing-wave dipole trap, the atoms are 
actively positioned at the centre of the cavity mode**. Single-atom storage times 
are of the order of one minute. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Experimental set-up. The two independent quantum nodes are designed to 
operate with similar physical parameters. In each apparatus, a single *’Rb atom 
is quasi-permanently trapped inside an optical dipole trap (potential depth Up/ 
kg = 3 and 5 mK, where kg is the Boltzmann constant) and held at the centre of a 
high-finesse optical cavity (finesse 6 X 10‘, mirror distance 0.5 mm, mode waist 
30 um). Both cavities have asymmetric mirror transmissions of T; < 6 p.p.m. and 
T, ~ 100 p.p.m., leading to a highly directional (T,., = 0.9) single output mode. 
Both systems produce photons on the D, line of °’Rb at a wavelength of 780 nm. 
In this configuration both atom-cavity systems operate in the intermediate- 
coupling regime of cavity QED (coherent atom-cavity coupling g= 2m X 5 MHz, 
cavity field decay rate «=2nX3MHz, atomic polarization decay rate 
y = 2n X 3 MHz). When a single atom is trapped inside each of the cavities, the 
experimental protocol runs at a repetition rate of 5 kHz, including optical pumping 
(20 us), photon generation (1 1s), photon storage (1-100 fs) and optical cooling of 
atomic motion (80 us). The necessary laser beams impinge perpendicular to the 
cavity axis. The presence and position of single atoms is monitored in real time by 
an electron multiplying CCD camera, which collects atomic fluorescence light 
(Fig. 1 inset). In combination with a longitudinally shiftable standing-wave dipole 
trap, the atoms are actively positioned at the centre of the cavity mode**. Single- 
atom storage times are of the order of one minute. 

Projective atomic state preparation for quantum state transfer. To characterize 
the quality of the quantum state transfer from node A to node B, the atom at node 
A is prepared in one of six different initial states of the form |i) = «|/F = 1, 
Mp 1) + B|F = 1, mp= +1), forming a regular octahedron on the Poincaré 
sphere. This is achieved through a projective measurement. The atom is initia- 
lized in the state |F = 2, my = 0), and subsequently a single photon is generated 
using a m-polarized laser pulse. This creates the atom-photon entangled state 


|Vseptoon ) = vA 1,—1)@|R) —|1,1)®|L)). Detection of this photon in a 


well-defined polarization state projects the atom in node A onto a qubit state 
|W), with (a, 8) determined by the particular choice of the detector’s polarization 
basis. Following this projective preparation, the known quantum state of node A 
is transferred to node B. Read-out of the state of the atom at node B is performed 
by mapping it onto a single photon whose polarization can be analysed. In 
calculating the fidelity of the quantum process, we assume perfect preparation 
of |y/q). It is therefore a lower bound on the fidelity of the state transfer. 

Post-selected fidelities. The ideal Raman schemes depicted in Figs 2-4 lead to 
emission of spatiotemporally well-defined single-photon wave packets with their 
polarization determined by the selection rules. The fidelities reported in the 
previous sections are obtained from analysing correlations between photon 
detection events in different polarization bases. We have identified several experi- 
mental imperfections which cause deviations from this ideal Raman scheme: 
imperfect initial state preparation, misaligned polarization of the control laser 
and off-resonant excitations. These imperfections not only affect polarization 
correlations but also the temporal wave packet shape of the emitted photons. A 
detailed analysis has shown that these imperfections are generally correlated with 
delayed photon emission with respect to the ideal Raman scheme. In this Article, 
we usually evaluate all read-out photons from node A and those photons from 
node B arriving within a 1 ,ts time interval centred around the maximum of the 
photon wave packet (see Fig. 2b). The contribution of the mentioned non-ideal 
processes to the measured fidelities can however be minimized when only 
those photon detection events are analysed that occur early in the photon’s 
temporal wave packet. Indeed, we find close to ideal fidelities for these subsets 
of data. In the atom-atom entanglement experiment the fidelity is increased to 
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Fiy-) = (98.7 + 2.2)% when considering only the first 50% of the ensemble of 
detected photons from node A and the initial 14% of the ensemble of photons 
from node B. The quoted values for F)y-) are the unbiased estimator and the 
statistical standard error. The likelihood function of Fj)y~) is non-Gaussian. 

The imperfections correlated with late photon emission can also be suppressed 

by tailoring the single-photon read-out process directly. When the applied 
control laser Rabi frequencies Qc are kept low and applied for only a short time, 
the read-out photon can be made to resemble the post-selected subset. We have 
made use of this weak read-out in the experiments on state transfer and remote 
atom-atom entanglement, thereby optimizing for high fidelities at the expense of 
the efficiency of the process. As all presented schemes are intrinsically deter- 
ministic, future improvements of the imperfect processes mentioned above will 
allow for both fidelities and efficiencies approaching unity, without the necessity 
for any trade-off. 
Success probabilities. Single photons are produced with an efficiency 7}_, =0.6 
at node B, and 7$_, =0.4 and 73_ , =0.03 at node A. The indices F = 1 and F = 2 
denote the initial hyperfine state of the atom. 1$_, is deliberately kept low to 
suppress off-resonant excitations to nearby hyperfine states. As this is only rel- 
evant for atoms prepared in F = 1 because of the near-resonant cavity, it can be 
circumvented by a local transfer of the qubit from the F = 1 to the F = 2 manifold 
using optical Raman or microwave pulses. The storage efficiency was Strans = 0.2 
and S.,,, = 0.14 in the experiments on quantum state transfer and remote atom- 
atom entanglement, respectively. An intracavity photon is outcoupled into one 
well-defined free-space mode with a probability of Tour = 0.9, which is matched to 
a single-mode optical fibre (fibre coupling efficiency up to 0.9). A fast-moving 
mirror switches the network between two configurations. The network’s nodes 
are either directly connected through an optical fibre path with a transmission 
Tnet = 0.4 (measured from before fibre input at node A to fibre output at node B). 
Or the nodes are disconnected and photons emanating from each node are guided 
to separate polarization detection set-ups with an optical path transmission of 
Tact = 0.6. With our photodetector quantum efficiency ¢ = 0.6, this yields a detec- 
tion probability of Pact = TourTaeté = 0.3 for a given intracavity photon. 

From these numbers the success probability for quantum state transfer can be 
calculated: Prrans ="p— 1 Tout TnetStrans =0.2%. The efficiency of the projective 
atomic state preparation at node A is Porep = na Pact = 12% and the experiments 
are repeated at a rate of f= 5 kHz. The duty cycle D denotes the fraction of the 
total measurement time during which data were taken. One limiting factor is the 
condition that a single atom needs to be trapped and localized at the centre of the 
cavity mode of node A and B simultaneously. The duty cycle is Dyans = 0.2 and 
Den = 0.3 in the experiments on quantum state transfer and remote atom-atom 
entanglement, respectively. This yields a rate for the state transfer of 
Reans = PtransfDtrans ~ 2 per second. The rate of successfully verified attempts, each 
consisting of quantum state preparation at node A, quantum state transfer and detec- 
tion of the readout photon from node B, is Ree = PprepPirans ne PactfDtrans~3 per 
minute. 

The success probability for remote entanglement creation is 
Pont =Mp—7 Tout TnetSent =2%, yielding a rate of Rent = PenfDent ~ 30 entangle- 
ment creations per second and Re = Pott Pactt[P—»PactfDent~3 detected 
events of a posteriori entanglement per minute. Direct readout of the atomic 
states with near-unity efficiency*' would increase the detected entanglement 
event rate to the value of Rent. Beyond the potential for higher efficiencies via 
an increased atom-cavity coupling strength (see Discussion), a higher rate can be 
achieved in steeper trapping potentials that allow for more efficient and therefore 
faster cooling. 
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Teneurins instruct synaptic partner 
matching in an olfactory map 


Weizhe Hong", Timothy J. Mosca! & Liqun Luo? 


Neurons are interconnected with extraordinary precision to assemble a functional nervous system. Compared to axon 
guidance, far less is understood about how individual pre- and postsynaptic partners are matched. To ensure the proper 
relay of olfactory information in the fruitfly Drosophila, axons of ~50 classes of olfactory receptor neurons (ORNs) form 
one-to-one connections with dendrites of ~50 classes of projection neurons (PNs). Here, using genetic screens, we 
identified two evolutionarily conserved, epidermal growth factor (EGF)-repeat containing transmembrane Teneurin 
proteins, Ten-m and Ten-a, as synaptic-partner-matching molecules between PN dendrites and ORN axons. Ten-m and 
Ten-a are highly expressed in select PN-ORN matching pairs. Teneurin loss- and gain-of-function cause specific 
mismatching of select ORNs and PNs. Finally, Teneurins promote homophilic interactions in vitro, and Ten-m 
co-expression in non-partner PNs and ORNs promotes their ectopic connections in vivo. We propose that Teneurins 
instruct matching specificity between synaptic partners through homophilic attraction. 


The chemoaffinity hypothesis was proposed nearly 50 years ago to 
explain the target specificity of regenerating optic nerves: developing 
neurons “must carry individual identification tags, presumably cyto- 
chemical in nature, by which they are distinguished one from another 
almost, in many regions, to the level of the single neuron”’. Many 
molecules are now known that guide axons to their target areas’, 
but few may mediate mutual selection and direct matching between 
individual pre- and postsynaptic partners. Here we show that the 
transmembrane Teneurin proteins instruct the selection of specific 
synaptic partners in the Drosophila olfactory circuit (Supplementary 
Fig. 1). 

In Drosophila, individual classes of ORN axons make one-to-one 
connections with individual classes of second-order PN dendrites 
within one of ~50 discrete glomeruli in the antennal lobe. We refer 
to this specific one-to-one connection as PN-ORN synaptic partner 
matching. Olfactory circuit assembly takes place in sequential steps 
before sensory activity begins*®. PN dendrites first elaborate within 
and pattern the developing antennal lobe’, which is followed by 
invasion of ORN axons’* “. Importantly, re-positioning PN dendrites 
redirects their partner ORN axons without disrupting the connec- 
tions'’, suggesting that proper PN-ORN connections probably 
involve direct recognition and matching between partners. 


Matching screens identified Ten-m and Ten-a 

To identify potential PN-ORN matching molecules, we simultaneously 
labelled select PN dendrites and ORN axons in two colours and per- 
formed two complementary genetic screens (Fig. 1a, d). We overex- 
pressed 410 candidate cell-surface molecules, comprising ~40% of the 
potential cell-recognition molecules in Drosophila’’. In the first screen, 
we used Mz19-GAL4 to label DA1, VAld and DC3 PNs (hereafter 
Mz19 PNs), and Or47b-rCD2 to label Or47b ORNs (Fig. 1a, b). 
Or47b ORN axons normally project to the VAllm glomerulus and 
are adjacent to Mz19 PN dendrites without overlap. We overexpressed 
candidate cell-surface molecules only in Mz19 PNs to identify those that 
promoted ectopic connections between Or47b axons and Mz19 den- 
drites (Fig. la). We found that overexpression of ten-m (P{GS}9267; 
Supplementary Fig. 2b) produced ectopic connections (Fig. 1c). 


In the second screen, we labelled Mz19 PNs as above and Or88a 
ORNs using Or88a-rCD2 (Fig. 1d, e). Or88a ORN axons normally 
project to the VAld glomerulus, intermingling extensively with 
VA1d PN dendrites (Fig. le). We overexpressed candidate cell-surface 
molecules in Mz19 PNs (Fig. 1d) as above and found that overexpres- 
sion of ten-a (P{GE}1914, Supplementary Fig. 2a) partially disrupted 
the intermingling of Or88a axons and Mz19 dendrites (Fig. 1f). 

In addition to impairing PN-ORN matching, ten-m and ten-a over- 
expression shifted Mz19 PN dendrite position (Fig. 1c, f). However, 
mismatching was not a secondary consequence of axon or dendrite 
mispositioning; mispositioning alone, caused by perturbation of 
other genes, does not alter PN-ORN matching”’*”*. Furthermore, 
among 410 candidate molecules, only ten-m and ten-a overexpression 
exhibited mismatching defects, suggesting their specificity in PN- 
ORN matching. 

Both ten-m and ten-a appear to encode type II transmembrane 
proteins’”". They possess highly similar domain compositions and 
amino acid sequences; each contains eight EGF-like and multiple YD 
(tyrosine-aspartate) repeats within its large carboxy-terminal extra- 
cellular domain (Fig. 1g). Ten-m and Ten-a were initially identified 
as tenascin-like molecules”®”!, but vertebrate teneurins were later 
identified as their true homologues based on sequence and domain 
similarity (Fig. 1h). Thus, we refer to Ten-m and Ten-a as Drosophila 
Teneurins. Teneurins are present in nematodes, flies and vertebrates. 
In human, teneurin-1 and teneurin-2 are located in chromosomal 
regions associated with intellectual disability’, and teneurin-4 is 
linked to susceptibility to bipolar disorder”. 

Drosophila ten-m was originally identified as a pair-rule gene 
required for embryonic patterning”’”’, but this function was recently 
shown to be unrelated to ten-m**. Teneurins were implicated in 
synapse development at the neuromuscular junction'®”* (see ref. 26), 
and Ten-m also regulates motor axon guidance™*. Neither the under- 
lying mechanisms nor their potential roles in the central nervous system 
are known. Vertebrate teneurins are widely expressed in the nervous 
system’*’” and interact homophilically in vitro***’, suggesting their 
potential role as homophilic cell adhesion molecules in patterning 
neuronal connectivity. 
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Figure 1 | PN-ORN synaptic matching screens identify two Teneurins. 

a, d, Schematics showing two PN-ORN matching screens. PN dendrites are 
labelled by Mz19-GAL4 driving mCD8GFP and ORN axons by Or47b-rCD2 
(a) or Or88a-rCD2 (d). Each EP line has a transposable element insertion that 
places UAS 5’ to a gene encoding a predicted cell-surface protein, which can be 
overexpressed using Mz19-GAL4. b, c, Or47b axons and Mz19 dendrites do not 
overlap in control (b), but form ectopic connections following Ten-m 
overexpression (c), as seen by axon-dendrite intermingling (arrowhead). 

e, f, Or88a axons and Mz19 dendrites connect at the VA1d glomerulus in 
control (e), but the connection is partially lost following Ten-a overexpression, 
as some of the Or88a axons no longer intermingle with Mz19 dendrites 


Matching expression of Ten-m and Ten-a 


Both Drosophila Teneurin proteins were endogenously expressed in 
the developing antennal lobe (Fig. 2a and Supplementary Fig. 3). At 
48 h after puparium formation (APF), when individual glomeruli just 
become identifiable, elevated Teneurin expression was evident in 
select glomeruli. The subset of glomeruli expressing elevated Ten-m 
was distinct but partially overlapping with that expressing elevated 
Ten-a (Fig. 2a, e). Teneurin proteins were also detected at a low level 
in all glomeruli. Both basal and elevated Teneurin expressions were 
eliminated by pan-neuronal RNA interference (RNAi) targeting the 
corresponding gene (Fig. 2b, c), suggesting that Teneurin proteins are 
produced predominantly by neurons. In a fen-a null mutant we 
generated (Supplementary Fig. 2a), all Ten-a expression was eliminated, 
confirming antibody specificity (Fig. 2d). 

The antennal lobe consists of ORN axons as well as PN and local 
interneuron dendrites. We used intersectional analysis to determine the 
cellular source for elevated Teneurin expression. For ten-m, we screened 
GAL4 enhancer traps near the ten-m gene, and identified NP6658 
(hereafter ten-m-GAL4, Supplementary Fig. 2b) that recapitulated the 
glomerulus-specific Ten-m staining pattern (Supplementary Fig. 4a—c). 
We used a FLPout reporter UAS>stop>mCD8GFP to determine the 
intersection of ten-m-GAL4 and an ORN-specific ey-Flp (Fig. 2f and 
Supplementary Fig. 4d-f) or a PN-specific GH146-Flp (Fig. 2g and 
Supplementary Fig. 4g-i). We found that ten-m-GAL4 was selectively 
expressed in a subset of ORNs and PNs. Owing to reagent availability, 
we focused our analysis on five glomeruli (DA1, VA1d, VA11m, DC3 
and DA3), adjacently located on the lateral and anterior side of the 
antennal lobe. In these five glomeruli, Ten-m expression in PN and 
ORN classes matched: high levels in PNs corresponded to high levels in 
ORNs and vice versa (Fig. 2f, g). 

To determine the cellular origin of elevated Ten-a expression, we 
performed tissue-specific RNAi of endogenous Ten-a, as no GAL4 
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(arrowhead). Target areas of Or47b (b, c) or Or88a (e, f) axons are outlined. 
Mismatching phenotypes are quantified in Supplementary Figs 9k and 10q. The 
first three columns in b, c, e, f show separate channels of the same section; the 
fourth shows higher magnification of the dashed squares (as in Figs 3, 4, 5d-g.). 
Unless indicated, all images in this and subsequent figures are single confocal 
sections and all scale bars are 10 um. g, Domain composition of Drosophila 
Ten-m and Ten-a. aa, amino acids; TM, transmembrane domain. h, Phylogeny 
of the Drosophila Teneurins and related proteins in other species. Branch 
lengths represent units of substitutions per site of the sequence alignment. 
Teneurins are evolutionarily conserved in bilaterians and a unicellular 
choanoflagellate Monosiga brevicollis, but not in cnidarians. 


enhancer trap is available near ten-a. To isolate Ten-a expression in 
ORNs, we drove pan-neuronal ten-a RNAi while specifically suppres- 
sing RNAi in ORNs using tubP>stop>GAL80 and ey-Fip (Fig. 2h). 
To restrict Ten-a expression to central neurons, we expressed ten-a 
RNAi in all ORNs (Fig. 2i). We found that Ten-a was highly expressed 
in a subset of ORNs and central neurons, and also showed a matching 
expression in the five glomeruli we focused on (Fig. 2h, i). The 
glomerular-specific differential Ten-a expression in central neurons 
probably arises mainly from PNs as they target dendrites to specific 
glomeruli, and punctate Ten-a staining was observed in PN cell bodies 
(Supplementary Fig. 5). In summary, Ten-m and Ten-a are each 
highly expressed in a distinct, but partially overlapping, subset of 
matching ORNs and PNs (Fig. 2)). 


Teneurins are required for PN-ORN matching 


To examine whether Teneurins are required for proper PN-ORN 
matching, we performed tissue-specific RNAi (Fig. 3 and Supplemen- 
tary Fig. 2c) in all neurons using C155-GAL4, in PNs using GH146- 
GAL4, or in ORNs using peb-GAL4. To label specific subsets of PN 
dendrites independent of GAL4-UAS, we used the Q binary expres- 
sion system”, and converted Mz19-GAL4 to Mz19-QF by bacterial 
artificial chromosome (BAC) recombineering (Supplementary Fig. 2d). 
We could thus perform GAL4-based RNAi knockdown while labelling 
PN dendrites and ORN axons in two colours independent of GAL4. 
We focused our analysis on Mz19 dendrites and Or47b axons, which 
innervate neighbouring glomeruli but never intermingle in wild type 
(Figs 1b and 3a, b). 

Pan-neuronal RNAi of both teneurin genes shifted Or47b axons to 
a position between two adjacent Mz19 glomeruli, DAl and VAId 
(Fig. 3c). Moreover, Mz19 dendrites and Or47b axons intermingled 
without a clear border (Fig. 3c, d), reflecting a PN-ORN matching 
defect. We confirmed this using independent RNAi lines targeting 
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Figure 2 | Ten-mand Ten-aare differentially expressed in matching PN and 
ORN classes. a, Developing antennal lobes at 48h APF stained by antibodies 
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(c), respectively. d, A ten-a homozygous mutant eliminated the Ten-a antibody 
staining. e, Summary of elevated Ten-m and Ten-a expression in five select 


different regions of the ten-m and ten-a transcripts (Supplementary 
Fig. 6). Further, knocking down ten-m and ten-a only in PNs or only 
in ORNs also led to Mz19-Or47b intermingling (Fig. 3e and 
Supplementary Fig. 7a, d), indicating that Teneurins are required in 
both PNs and ORNs to ensure proper matching. 
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Figure 3 | Loss of Teneurins causes PN-ORN mismatching. a, Normally, 
Mz19 dendrites (green) innervate glomeruli adjacent to the VA11m glomerulus, 
which is itself innervated by Or47b axons (red). The dashed line encircles 
Or47b axons. DC3 PNs are located posterior to DA1/VA1d PNs and Or47b 
ORNs, and are not visible in these sections. c, Mismatching phenotypes in ten- 
mand ten-a RNAi driven by the pan-neuronal driver C155-GAL4. Dashed lines 
encircle Or47b ORN axons, showing intermingling with Mz19 PN dendrites 
(arrowhead). e, Quantification of Mz19-Or47b mismatching phenotypes. For 
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Next, we examined the contribution of each Teneurin by individual 
RNAi knockdown in ORNs. Knocking down ten-m and, to a lesser 
extent, ten-a, caused mild mismatching (Fig. 3e and Supplementary 
Fig. 7). This was greatly enhanced by simultaneous knockdown of 
both ten-m and ten-a (Fig. 3e), probably because Mz19-Or47b 
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all genotypes, n = 15. f, In control, DA1 PNs do not intermingle with Or47b 
ORNs. h, MARCM expression of ten-a RNAi in DA1 PNs causes dendrite 
intermingling with Or47b axons (arrowhead). j, Quantification of mismatching 
phenotypes. For all genotypes, n = 6. Error bars represent s.e.m. ***, P< 0.001 
compared to control. b, d, g, i, Summary showing normal connectivity in 
control (a, f) and mismatching phenotypes following teneurin RNAi 

(c, h). Blue, Ten-m high; orange, Ten-a high. Green outlines, labelled PNs. Red 
outlines, labelled ORNs. Scale bars, 10 um. 
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mismatching requires weakening connections with their respective 
endogenous partners (Supplementary Fig. 7g). This synergy implies 
that multiple matching molecules can enhance partner matching 
robustness. 

We also tested the functions of individual Teneurins in PNs. We 
found that the Mz19-Or47b mismatching was caused by PN-specific 
knockdown of ten-a, but not ten-m (Fig. 3e and Supplementary Fig. 7). 
As VA1d/DC3 and DA1 PNs arise from separate neuroblast lineages”’, 
we used mosaic analysis with a repressible cell marker (MARCM) to 
generate neuroblast clones to label and knockdown ten-a in DAI or 
VA1d/DC3 PNs (Fig. 3f-j; see Methods). ten-a knockdown only in 
DAI PNs (normally Ten-a high) caused their dendrites to mismatch 
with Or47b axons (Fig. 3h-j). By contrast, ten-a knockdown in VA1d/ 
DC3 PNs (normally Ten-a low) did not cause mismatching (Fig. 3j 
and Supplementary Fig. 8a, b). Similarly, MARCM loss-of-function of 
ten-a mutant in DA] but not in VA1d/DC3 PNs resulted in mismatch- 
ing with Or47b ORNs (Fig. 3j and Supplementary Fig. 8c, d). Thus, 
removal of ten-a from Ten-a-high DA1 PNs caused their dendrites to 
mismatch with Ten-a-low Or47b ORNs (Fig. 3i). The differential 
requirements of Ten-m and Ten-a in ORNs or PNs in preventing 
Mz19-Or47b mismatching probably reflect differential expression of 
Ten-m and Ten-a in the mismatching partners. 

Our finding that loss of ten-a caused Ten-a-high PNs to mismatch 
with Ten-a-low ORNs (Fig. 3i, j), together with the matching expres- 
sion of Teneurin proteins in PNs and ORNs, raised the possibility that 
Teneurins instruct class-specific PN-ORN connections through 
homophilic attraction: PNs expressing high-level Ten-m or Ten-a 
connect to ORNs with high-level Ten-m or Ten-a, respectively. 


Teneurins instruct matching specificity 


This homophilic attraction hypothesis predicts that overexpression of 
a given Teneurin in PNs (1) should preferentially affect PNs normally 
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Figure 4 | Teneurin overexpression in specific PN classes causes 
mismatching. a-l, Mismatching phenotypes following Ten-m (a-f) or Ten-a 
(g-l) overexpression in different PN classes. Specific PN classes are labelled by 
MARCM with Mz19-GAL4 and ORN axons using Or47b-rCD2 (a, d) or Or23a- 
mCD8GFP (g, j). In control, Mz19 PNs do not intermingle with Or47b ORNs 
(Fig. 1b). MARCM overexpression of Ten-m in DA1 PNs (a, arrowhead), but 
not VA1d/DC3 PNs (d), causes dendrite mismatching with Or47b axons. 
MARCM overexpression of Ten-a in VA1d/DC3 PNs (j, arrowhead), but not in 
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expressing low levels of that Teneurin, causing their dendrites to lose 
endogenous connections with their cognate ORNs, and (2) should 
cause these PNs to make ectopic connections with ORNs expressing 
high levels of that Teneurin. 

To test the first prediction, we examined whether Teneurin over- 
expression in Mz19 PNs impaired their endogenous connections with 
cognate ORNs. Consistent with our prediction, Ten-m overexpression 
specifically disrupted the connections of DA1 PNs and Or67d ORNs, a 
PN-ORN pair expressing low-level Ten-m (Supplementary Fig. 9b, e). 
Connections of the other two pairs were unaffected (Supplementary 
Fig. 9a, c, d, f). Likewise, Ten-a overexpression specifically disrupted 
connections between VA1d PNs and Or88a ORNs, a PN-ORN pair 
expressing low-level Ten-a (Supplementary Fig. 9g), but not between 
the other two PN-ORN pairs (Supplementary Fig. 9h, i). 

To test the second prediction, we examined the specificity of 
ectopic connections made by Mz19 PNs overexpressing Teneurins, 
and sampled five non-partner ORN classes that project axons to the 
vicinity of Mz19 dendrites (Supplementary Fig. 10). We found that 
Ten-m overexpression in Mz19 PNs caused their dendrites to mismatch 
only with Or47b ORNs (Supplementary Fig. 10f). To examine addi- 
tional mismatching phenotypes that may occur within Mz19 glomeruli 
and to determine whether DA1 or VA1d/DC3 PNs contribute to the 
ectopic connections, we used MARCM to overexpress Ten-m in indi- 
vidual PN classes. We found that Ten-m overexpression in DA1 PNs 
(Ten-m low) caused their dendrites to mismatch with Or47b (Fig. 4a, b) 
and (to a lesser extent) Or88a ORNs (Fig. 4b, c), both endogenously 
expressing high-level Ten-m. By contrast, Ten-m overexpression in 
VA1d/DC3 PNs did not produce ectopic connections with any non- 
matching ORNs tested (Fig. 4d-—f). 

Likewise, Ten-a overexpression in Mz19 PNs caused their dendrites 
to mismatch only with Or23a ORNs among all non-matching ORN 
classes sampled outside the Mz19 region (Supplementary Fig. 101). 


Summary Mismatching with DA1 PNs 


S fone ; 

b PN Map ORN Map e499 in Ten-m overexpression Po 
g Pe 
e ce 
wo 50 E oO 
c=) £0 
2 9 Bo 
) reye% 
oa 


g > 
a oO 
e PNMap = ORN Map f 51004 Mismatching with VA1d/DC3 PNs 790 D> 
2 50 in Ten-m overexpression 60 PE 
£ 30 £8 
oO 
\ s ° 0 228 


53 2 


Mismatching with DA1 PNs 
in Ten-a overexpression 


h PNMap ORNMap i 


Penetrance (%) 
K 
ro) 
Intermingling 
percentage 


Mismatching with VA1d/DC3 PNs 
in Ten-a overexpression 80 


D 
© 
oy 
22 
50 40 Eo 
£0 
2 
Lo 
ae 


k PNMap ORNMap I 100 


Penetrance (%) 


Or43a 
(DA4I) 
Or46Aa 
(var) |] 


DAI PNs (g), causes their dendrites to mismatch with Or23a axons. P{GS}9267 
and P{GE}1914 (Supplementary Fig. 2) are used to overexpress Ten-m and 
Ten-a, respectively. c, f, i, 1, Quantification of mismatching phenotypes (n = 9 
for each). Error bars represent s.e.m. See Supplementary Fig. 10 for details on 
some genotypes quantified here. b, e, h, k, Schematic summarizing the 
mismatching phenotypes in Fig. 4 and Supplementary Figs 9 and 10. Blue, Ten- 
m high; orange, Ten-a high. Scale bars, 10 um. 
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Further, MARCM overexpression of Ten-a in VA1d/DC3 PNs (Ten-a 
low) caused their dendrites to mismatch specifically with Or23a 
(Fig. 4j, k) and (to a lesser extent) Or67d ORNs (Fig. 4k, 1), both 
endogenously expressing high-level Ten-a (Fig. 41). By contrast, Ten-a 
overexpression in DA1 PNs (Ten-a high) did not produce ectopic 
connections with any non-matching ORNs tested (Fig. 4g-i). Thus, 
both Ten-m and Ten-a overexpression analyses support the homophilic 
attraction hypothesis. 

Our data also suggest that additional molecule(s) are required to 
determine completely the wiring specificity of the five PN-ORN pairs 
examined. For example, VAld—Or88a and VAl1m-Or47b have in- 
distinguishable Ten-m/Ten-a expression patterns (Fig. 2j), and may 
require additional molecules to distinguish target choice. Indeed, 
Ten-a knockdown (Fig. 3h—-j and Supplementary Fig. 8e, f) or Ten-m 
overexpression (Fig. 4b, c) caused DA1 PNs to mismatch preferentially 
with Or47b as opposed to Or88a axons. This suggests that the non- 
adjacent DA1 and VA11m share a more similar Teneurin-independent 
cell-surface code than the adjacent VA1d and VAI1Im. Likewise, Ten-a 
overexpression caused VA1d PNs to mismatch with the non-adjacent 
Or23a more so than the adjacent Or67d ORNs, even though both 
ORNs express high-level Ten-a (Fig. 4k, 1). Finally, Ten-m overexpres- 
sion in DC3 PNs, which express low-level Ten-m, did not change its 
matching specificity (Fig. 4f and Supplementary Fig. 9f), suggesting 
that Teneurin-independent mechanisms are involved in matching 
DC3 PNs and Or83c ORNs. 

In summary, we showed that Teneurin overexpression in Teneurin- 
low PNs caused their dendrites to lose endogenous connections 
with Teneurin-low ORNs and mismatch with Teneurin-high ORNs 
(Fig. 4b, k). However, Teneurin overexpression in Teneurin-high PNs 
did not disrupt their proper connections (Fig. 4e, h). These data indi- 
cate that Teneurins instruct connection specificity, probably through 
homophilic attraction, by matching Ten-m or Ten-a levels in PN and 
ORN partners. 


Ten-m promotes PN-ORN homophilic attractions 


To test whether Teneurins interact in vitro, we separately transfected 
two populations of Drosophila S2 cells with Flag- and haemagglutinin 
(HA)-tagged Teneurins, and performed co-immunoprecipitations 
from lysates of these cells after mixing. We detected strong homophilic 
interactions between Flag- and HA-tagged Ten-m proteins and, to a 
lesser extent, between Flag- and HA-tagged Ten-a proteins (Fig. 5a). 
Ten-m and Ten-a also exhibited heterophilic interactions (Fig. 5a), 
which may account for their role in synapse organization”®. 

Next, we tested whether Teneurins can homophilically promote 
in vivo trans-cellular interactions between PN dendrites and ORN 
axons. We simultaneously overexpressed Ten-m in Mz19 PNs using 
Mz19-QF, and Or67a and Or49a ORNs using AM29-GAL4 (ref. 32; 
Fig. 5b). This enabled us to label and manipulate independently Mz19 
dendrites and AM29 axons with distinct markers and transgenes. We 
chose AM29-GAL4 because of its early onset of expression, whereas 
other class-specific ORN drivers start to express only after PN-ORN 
connection is established**®. AM29 axons do not normally connect 
with Mz19 dendrites (Fig. 5c, d). 

Simultaneous overexpression of Ten-m in both Mzl9 PNs and 
AM29 ORNs produced ectopic connections between them (Fig. 5c, g), 
suggesting that Ten-m homophilically promotes PN-ORN attraction. 
By contrast, Ten-m overexpression only in PNs or ORNs did not 
produce any ectopic connections, despite causing dendrite or axon 
mistargeting, respectively (Fig. 5c, e, f). These data ruled out the 
involvement of heterophilic partners in Ten-m-mediated attraction. 
Simultaneous overexpression of Ten-a in Mz19 PNs and AM29 ORNs 
did not produce ectopic connections (data not shown), possibly due to 
lower expression or weaker Ten-a homophilic interactions (Fig. 5a). 
Although heterophilic interactions between Ten-m and Ten-a also 
occur in vitro (Fig. 5a), heterophilic overexpression of Ten-m and 
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Figure 5 | Ten-m promotes homophilic interactions in vitro and in vivo. 
a, Co-immunoprecipitation of Flag- and HA-tagged Teneurin proteins from 
separately transfected $2 cells. Co-immunoprecipitated HA-tagged Teneurin 
proteins are detected by anti-HA antibody and immunoprecipitated Flag- 
tagged Teneurin proteins by anti-Flag antibody (top two blots). The input 
lysates are immunoblotted for HA and Flag (bottom two blots). b, Schematic 
showing the relative positions of glomeruli targeted by Mz19 PN dendrites 
(green) and AM29 ORN axons (red). ¢, Quantification of mismatching between 
Mz19 PNs and AM29 ORNs (n = 10 for each condition). OE, overexpression. 
d, In control, Mz19 dendrites do not connect with AM29 axons. 

e, f, Overexpression of Ten-m only in AM29 ORNs (e) or Mz19 PNs (f) does 
not produce mismatching between them. Following Ten-m overexpression, 
AM29 axons mistarget posteriorly to Mz19 dendrites and are therefore not 
visible in e. g, Simultaneous overexpression of Ten-m in both PNs and ORNs 
produces ectopic Mz19-AM29 connections (arrowhead). Schematics on the 
right show the Mz19-AM29 connectivity in different conditions. h, The 
synaptic vesicle marker Synaptotagmin is enriched at these Mz19-AM29 
ectopic connections. AM29 axons are labelled by AM29-Gal4 with UAS-mtdT 
to visualize the entire axonal processes and UAS-synaptotagmin-HA to 
visualize synaptic vesicles in axon terminals. Mz19 dendrites are labelled by 
Mz19-QF driving QUAS-mCD8GEP. To overexpress Ten-m, P{GS}9267 and 
QUAS-ten-m (Supplementary Fig. 2) are driven by AM29-GAL4 and Mz19-QF, 
respectively. Scale bars, 10 jum. 


Ten-a in AM29 ORNs and Mz19 PNs did not produce ectopic con- 
nections (data not shown). 

Finally, we examined whether these ectopic connections lead to 
the formation of synaptic structures. Indeed, the ectopic connec- 
tions between Mz19 dendrites and AM29 axons were enriched in 
synaptotagmin-HA expressed from AM29 ORNs (Fig. 5h), suggest- 
ing that these connections can aggregate synaptic vesicles and could 
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be functional. We propose that Teneurins promote attraction between 
PN-ORN synaptic partners through homophilic interactions, even- 
tually leading to synaptic connections. 


Discussion 

Compared to axon guidance, relatively little is known about synaptic 
target selection mechanisms**. Among the notable examples, the 
graded expressions of vertebrate EphA and ephrin-A instruct the 
topographic targeting of retinal ganglion cell axons****. Chick 
DSCAM and Sidekick promote lamina-specific arborization of retinal 
neurons**. Drosophila Capricious promotes target specificity of 
photoreceptor and motor axons'**”-*’. Caenorhabditis elegans SYG-1 
and SYG-2 specify synapse location through interaction between pre- 
synaptic axons and intermediate guidepost cells*°. However, it is 
unclear whether any of these molecules mediate direct, selective 
interactions between individual pre- and postsynaptic partners. 
Indeed, in complex neural circuits, it is not clear a priori whether 
molecular determinants mediate such interactions. For example, the 
final retinotopic map is thought to result from both ephrin signalling 
and spontaneous activity“. Mammalian ORN axon targeting 
involves extensive axon—axon interactions through activity-dependent 
and independent modes**’, with minimal participation of post- 
synaptic neurons identified thus far. 

Here, we show that Teneurins instruct PN-ORN matching through 
homophilic attraction. Although each glomerulus contains many 
synapses between cognate ORNs and PNs, these synapses transmit 
the same information and can be considered identical with regard to 
specificity. Thus, Teneurins represent a strong case in determining 
connection specificity directly between pre- and postsynaptic neurons. 
We further demonstrate that molecular determinants can instruct 
connection specificity of a moderately complex circuit at the level of 
individual synapses. 

Our study reveals a requirement for PN-ORN attraction in the 
stepwise assembly of the olfactory circuit. PN dendrites and ORN 
axons first independently project to appropriate regions using global 
cues, dendrite-dendrite and axon-axon interactions®*”’"“. The initial, 
independent targeting of PN dendrites and ORN axons is eventually 
coordinated in their final one-to-one matching. We identified 
Teneurins as the first molecules to mediate this matching process, 
through direct PN-ORN attraction. Our analyses have focused on a 
subset of PN-ORN pairs involving trichoid ORNs”, including Or674, 
Or88a and Or47b that have been implicated in pheromone sensation*®. 
The partially overlapping expressions of Teneurins in other PN and 
ORN classes (Fig. 2 and Supplementary Fig. 4) suggest a broader 
involvement of Teneurins. At the same time, additional cell-surface 
molecules are also needed to determine completely connection specificity 
of all 50 PN-ORN pairs. 

Teneurins are present throughout Animalia (Fig. 1h). Different 
vertebrate teneurins are broadly expressed in distinct and partially 
overlapping patterns in the nervous system’’. Teneurin-3 is expressed 
in the visual system and is required for ipsilateral retinogeniculate 
projections’. Our study suggests that differential Teneurin expression 
may have a general role in matching pre- and postsynaptic partners. 
Indeed, high-level Ten-m is involved in matching select motor neurons 
and muscles*®. Furthermore, Ten-m and Ten-a also trans-synaptically 
mediate neuromuscular synapse organization”’. This suggests that the 
synapse partner matching function of Teneurins may have evolved 
from their basal role in synapse organization. Interestingly, synaptic 
partner matching only involves homophilic interactions (this study 
and ref. 26), whereas synapse organization preferentially involves 
heterophilic interactions”®. This could not be fully accounted for by 
the different strength of their homophilic and heterophilic interactions 
in vitro (Fig. 5a). We speculate that these dual functions of Teneurins 
in vivo may engage signalling mechanisms that further distinguish 
homophilic versus heterophilic interactions. 
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METHODS SUMMARY 


Detailed methods on fly stocks, generation of the ten-a allele, construction of 
transgenic flies, clonal analysis, histology, imaging, quantification and statistical 
analysis, epitope-tagged constructs, and co-immunoprecipitation can be found in 
Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Fly stocks. Mz19-GAL4 (ref. 7) was used to label PNs. Or-rCD2 lines’*** (Or47b- 
rCD2 and Or88a-rCD2), Or-mCD8GFP lines*® (Or23a-mCD8GFP, Or43a- 
mCD8GFP, Or46Aa-mCD8GFP, Or56a-mCD8GFP and Or83c-mCD8GFP), 
Or67d-GAL4 (ref. 49) and AM29-GAL4 (ref. 32) were used to label ORNs. 
GH146-Flp (ref. 9), ey-Flp (ref. 50), UAS>stop>mCD8GFP (ref. 9) and 
tubP>stop>GAL80 (ref. 51) were used to perform the intersectional expression 
analysis. P{GS}9267 (ten-m) was generated by the Drosophila Gene Search Project 
(Metropolitan University)” and P{GE}1914 (ten-a)'® was from the GenExel 
collection of EP lines generated by the Korean Advanced Institute of Science 
and Technology. Their ability to drive the overexpression of each respective 
Teneurin was verified by elevated antibody staining. 

All RNAi lines targeting ten-m or ten-a from the Vienna Drosophila RNAi 
Center (UAS-ten-m®%4*V°"73 and UAS-ten-a®NA‘ V4?) the Bloomington 
Drosophila Stock Center (UAS-ten-mRNA1I¥03323 and UAS-ten-aNAtI#03975), 
and the National Institute of Genetics Fly Stock Center (UAS-ten-m®\4'°778 
and UAS-ten-a®\4' 798) were collected. The efficiency of all RNAi lines was 
tested by pan-neuronal expression using C155-GAL4 followed by Ten-m or 
Ten-a antibody staining. UAS-ten-m®™4°Y°" and UAS-ten-m®NAFO%3 
targeting ten-m, and UAS-ten-a®N4"V9748? and UAS-ten-a®NA11F0997 targeting 
ten-a were able to eliminate respective antibody staining beyond detection. 
UAS-ten-m®N4*-V91173 and UAS-ten-a®%4"Y*4? were used in all the experiments 
except Supplementary Fig. 6, in which UAS-ten-m®N41¥°25. and UAS-ten- 
a®NA‘IFO3579 were used to confirm the RNAi phenotypes. UAS-Der2 (ref. 53) 
was used to enhance RNAi efficiency. 

To identify ten-m-GAL4, we collected a group of GAL4 enhancer traps” 
located near the 5’ end of the ten-m gene. Their expression patterns were deter- 
mined using a membrane-tagged GFP reporter UAS-mCD8GFP gene from the 
Drosophila Genetic Resource Center. NP6658-GAL4, which recapitulated the 
glomerulus-specific Ten-m staining pattern, was identified and referred to as 
ten-m-GAL4. 

Generation of the ten-a allele. A small deficiency of ten-a was generated by FRT- 
mediated excision. This deficiency allele is homozygous viable and contains a 
deletion between P{XP}d07540 and PBac{WH}f01428, which flanks the entire 
ten-a genomic region (~140kb) and four additional predicted genes (Sup- 
plementary Fig. 2a), and is referred to as Df(X)ten-a. The deletion was verified 
by both PCR and antibody staining against Ten-a. The loss-of-function pheno- 
types were due to the loss of ten-a rather than the four additional predicted genes, 
as they mimicked the RNAi phenotypes. 

Construction of UAS and QUAS transgenic flies. Ten-m and Ten-a coding 
sequences were amplified from the cDNA constructs******. One primer amplified 
from the start codon and added a CACC overhang for the TOPO reaction and a 
Kozak sequence. The other primer amplified to the stop codon. The PCR pro- 
ducts were subcloned into pENTR-D/TOPO (Invitrogen). A 46-bp irrelevant 
fragment was found in the middle of the ten-m coding sequence in the original 
cDNA construct, and was removed by replacing a small region containing this 
fragment with the corresponding region in the ten-m genomic DNA. To make 
UAS-ten-m, UAS-ten-a, QUAS-ten-m and QUAS-ten-a, pENTR-ten-m and 
pENTR-ten-a were recombined into destination vector pUASt-Gateway-attB”’ 
and pQUASt-Gateway-attB using LR Clonase II (Invitrogen). The destination 
vector pQUASt-Gateway-attB was constructed by replacing the UAS site in 
pUASt-Gateway-attB with a QUAS site. All constructs were sequence verified. 
All the UAS and QUAS transgenes were integrated into both attP24 and 86Fb 
landing sites***° on second and third chromosomes, respectively. All transgenic 
flies were verified by PCR and overexpression followed by antibody staining. The 
UAS and QUAS transgenes inserted in the 86Fb site were used in this paper. 
BAC recombineering to construct Mz19-QF. A 110-kb BAC (#CH321-85L03) 
in the attB-P[acman]-CmR vector®, which contains genomic DNA that covers the 
Mz19-GAL4 enhancer trap insertion site, was collected from the BACPAC 
Resources Center. The QF coding sequence, with a P-element minimal promoter 
and an hsp70 polyA, was amplified using primers containing 50-bp arm sequences 
allowing site-specific recombination. The 5-kb PCR product was recombined into 
the 110-kb genomic BAC using bacterial BAC recombineering and was verified by 
sequencing. The 115-kb Mz19-QF BAC was further verified by digestion pattern 
analysis and used to produce transgenes at the VK37 landing site*' on the second 
chromosome by BestGene. The Mz19-QF transgenic flies were verified by PCRand 
the expression of reporters QUAS-mCD8GFP or QUAS-mTdt3HA. 

Clonal analysis. To determine the contribution of individual PNs to the ectopic 
connections with ORNs, the MARCM method® was applied. Briefly, heat-shock- 
induced Flp activity caused mitotic recombination of the FRT chromosome arm such 
that one of the daughter cells lost GAL80. This cell (and its progeny) can therefore be 
labelled by the GAL4-UAS system. For generating neuroblast clones, flies were heat- 
shocked between 24-48 h after egg laying for 1h at 37 °C. Mz19-GAL4 labels VA1d 


and DC3 from the anterodorsal neuroblast and DA1 from the lateral neuroblast’. By 
generating neuroblast clones at 24-48h after egg laying, we used MZ19-GAL4 to 
specifically label DA1 or VA1d/DC3 PNs and simultaneously express RNAi targeting 
ten-a, or overexpress Ten-m or Ten-a in the labelled neurons. 

In the ten-a mutant analysis, Df(X)ten-a was placed in trans to GAL80 on the 
FRT chromosome arm. Upon Flp-induced mitotic recombination, one of the 
daughter cells became homozygous for ten-a and simultaneously lost GAL80. 
We used MZ19-GAL4 to specifically label DA1 or VA1d/DC3 mutant PNs. 

Different classes of ORNs, except for Or67d, were labelled by Or-mCD8GFP 
transgenes in a GAL4-independent manner, allowing the visualization of the 
specific matching between the labelled PNs and ORNs. Owing to the lack of an 
Or67d-mCD8GFEP, Or67d ORNs were labelled by Or67d-GAL4 and Ten-m over- 
expression was achieved by using Mz19-QF to drive QUAS-ten-m (Supplemen- 
tary Fig. 9). In Teneurin overexpression by Mz19-QF, Or67d-GAL4 expression 
was found unchanged compared with the control, and co-localized with Ncad 
staining in the DA1 glomerulus, which can be unambiguously identified 
(Supplementary Fig. 9). Therefore, Ncad staining in the DA1 glomerulus was 
used to determine the location of Or67d ORNs in Fig. 4f, 1, in which Teneurins 
were overexpressed by Mz19-GAL4. 

Histology. The procedures used for fixation and immunostaining were described 
previously’. For primary antibodies, we used mouse nc82 (1:30), rat antibody to 
N-cadherin (1:40), rat antibody to mCD8 (1:100), mouse antibody to rCD2 
(1:200), chicken antibody to GFP (1:1,000), mouse antibody to HA (1:1,000), 
rabbit antibody to HA (1:1,000), rabbit antibody to DsRed (1:500), mouse antibody 
(mAb20) to Ten-m (1:3,000)*, and guinea pig antibody to Ten-a (1:100)™. 
Neuropil staining indicates the antennal lobe, where PN dendrites and ORN axons 
are located. Fluorescent labelling outside the antennal lobe may come from labelled 
PN cell bodies or non-specific tissues. 

Imaging, quantification and statistical analysis. Immunostained brains were 
imaged with a Zeiss LSM 510 Meta laser-scanning confocal microscope. Images 
of antennal lobes were taken as confocal stacks with 1-j1m-thick sections. 
Representative single sections were shown to illustrate the matching and mismatch- 
ing between PN dendrites and ORN axons. Penetrance of phenotypes represents the 
percentage of animals in which at least one antennal lobe showed a given phenotype 
among the total animals examined. Percentage of intermingling represents the 
fraction of labelled dendrites located within the axonal area of a given ORN class, 
and was measured by dividing dendritic area by total axonal area in a single confocal 
plane that shows maximum intermingling between dendrites and axons. Statistical 
significance between two samples was determined by the unpaired Student’s t-test. 
Flag- and HA-tagged constructs. To express Flag- and HA-tagged proteins in $2 
cells, the Gateway destination vectors pUASt-Flag-Gateway(-w) and pUASt-HA- 
Gateway(-w) were generated by removing a ~4.5-kb non-essential fragment 
between two DrallI sites that contains the white gene from the original 
Gateway vectors pTFW and pTHW (Drosophila gateway collection, DGRC, 
Bloomington), respectively. The modified destination vectors are ~40% smaller 
than the original ones while preserving all the essential components for S2 cell 
expression, and showed greater transfection and expression efficiency in S2 cells. 
To express Flag- and HA-tagged Teneurin proteins in S2 cells, pENTR-ten-m and 
pENTR-ten-a were recombined into modified destination vectors pUASt-Flag- 
Gateway(-w) and pUASt-HA-Gateway(-w) using LR Clonase II (Invitrogen). All 
expression constructs, including UAS-Flag-ten-m, UAS-Flag-ten-a, UAS-HA- 
ten-m and UAS-HA-ten-a, were sequence verified. 

Co-immunoprecipitation assay. S2 cells were cultured in Schneider’s insect 
medium (Sigma) according to the manufacturer’s description. UAS-Flag-ten-m, 
UAS-Flag-ten-a, UAS-HA-ten-m or UAS-HA-ten-a constructs were separately trans- 
fected into S2 cells, along with an Actin5c-GAL4 vector, using the Effectene transfec- 
tion reagent (QIAGEN). The amount of each construct and the number of cells used 
for transfection were adjusted to ensure comparable expression levels of Ten-m and 
Ten-a proteins. Three days after transfection, separately transfected cells were har- 
vested, mixed together, and incubated for 1 h at room temperature (25 °C). Equivalent 
amounts of untransfected cells were used as controls, and the final mixtures contained 
the same total amount of cells under all co-immunoprecipitation conditions. The 
mixed cells were lysed in lysis buffer (50mM Tris-HCl pH 7.4, 10mM MgCh, 
150 mM NaCl, 1 mM EGTA, 10% glycerol) supplemented with 0.5% Nonidet P-40 
and protease inhibitor cocktail (Sigma). The cell lysates were then incubated with 
EZview Red anti-Flag M2 affinity gel (Sigma) for 3 hat 4 °C with rotation. The samples 
were washed extensively in lysis buffer. The proteins were eluted in 2% SDS elusion 
buffer, and were detected using western blot analysis using rat antibody to HA 
(1:1,000, Roche), mouse antibody to Flag (1:5,000, Sigma), and HRP-conjugated-goat 
antibodies to rat or mouse primaries (both at 1:20,000, Jackson ImmunoResearch) 
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Structure of the mitotic checkpoint 


complex 


William C. H. Chao, Kiran Kulkarni!*, Ziguo Zhang', Eric H. Kong! & David Barford! 


In mitosis, the spindle assembly checkpoint (SAC) ensures genome stability by delaying chromosome segregation until 
all sister chromatids have achieved bipolar attachment to the mitotic spindle. The SAC is imposed by the mitotic 
checkpoint complex (MCC), whose assembly is catalysed by unattached chromosomes and which binds and inhibits 
the anaphase-promoting complex/cyclosome (APC/C), the E3 ubiquitin ligase that initiates chromosome segregation. 
Here, using the crystal structure of Schizosaccharomyces pombe MCC (a complex of mitotic spindle assembly checkpoint 
proteins Mad2, Mad3 and APC/C co-activator protein Cdc20), we reveal the molecular basis of MCC-mediated APC/C 
inhibition and the regulation of MCC assembly. The MCC inhibits the APC/C by obstructing degron recognition sites on 
Cdc20 (the substrate recruitment subunit of the APC/C) and displacing Cdc20 to disrupt formation of a bipartite D-box 
receptor with the APC/C subunit Apc10. Mad2, in the closed conformation (C-Mad2), stabilizes the complex by optimally 
positioning the Mad3 KEN-box degron to bind Cdc20. Mad3 and p31°°™* (also known as MAD2L1-binding protein) 
compete for the same C- Mad? interface, which explains how p31°°™ disrupts MCC assembly to antagonize the SAC. This 


study shows how APC/C inhibition is coupled to degron recognition by co-activators. 


The fidelity of chromosome separation in mitosis is governed by an 
evolutionarily conserved cell-cycle checkpoint mechanism called the 
SAC’. The SAC arrests the mitotically dividing cell to allow complete 
chromosome attachment to the bipolar mitotic spindle. The essence of 
the SAC is to block the onset of anaphase by inhibiting APC/C- 
mediated ubiquitin-dependent degradation of securin and mitotic 
cyclin. Components of the SAC that are responsible for detecting 
unattached kinetochores and for propagating signals to the APC/C 
have been identified**, but the molecular basis underlying these pro- 
cesses is only partially understood’. Mad2 and Mad3 (BubR1 in 
metazoans) mediate APC/C inhibition through their association with 
its co-activator subunit Cdc20 (refs 5-12). Mad2, Mad3 and Cdc20 
(together with mitotic checkpoint protein Bub3) form the MCC that 
directly binds the APC/C to inhibit substrate recognition’*"*. Mad2 
and Mad3 cooperate to antagonize Cdc20-dependent activation of the 
APC/C”, with Mad3-Cdc20 interactions requiring the pre-assembly 
of a Cdc20—Mad2 complex’*'*"'*. Thus, SAC signalling occurs through 
the generation of the Cdc20-Mad2 complex, a process initiated by 
Mad1, which is the Mad2 receptor at unattached kinetochores. 
Central to the association of Mad2 with Cdc20 is the inter-conversion 
of Mad2 between the open (O-Mad2) and closed (C-Mad2) structural 
states”, These states of Mad2 differ in the topology of a carboxy- 
terminal B-sheet that repositions in C-Mad2 to enable binding to its 
protein ligands, Mad1 or Cdc20 (refs 20-22). In the template model for 
SAC activation, Mad1 interacts with C-Mad2, generating the C-Mad2- 
Mad1 complex that subsequently recruits O-Mad2 through the 
C-Mad2 dimerization interface. By inducing the conformational trans- 
ition of O-Mad2 to C-Mad2, the Mad1-bound C-Mad2 subunit cata- 
lyses the binding of Mad2 to Cdc20 (refs 17, 23). 

APC/C activity and its substrate recruitment are dependent on its co- 
activators (Cdc20 and Cdh1)”, which recognize APC/C substrates 
through two destruction motifs (degrons); the D box* and the KEN 
(Lys-Glu-Asn) box’*. Mad3 contains a KEN box that is essential for 
MCCassembly’*”’”*, suggesting that Mad3 may act as a pseudosubstrate 


to block substrate recognition by APC/C“*°. However, other studies 


showing that the promotion of ubiquitin-mediated degradation of 
Cdc20 by the SAC is dependent on the KEN box of Mad3 (refs 18, 
27, 29) indicate that there is a more complex mechanism controlling 
APC/C“° activity. The mechanisms underlying APC/C activation 
after SAC silencing are also poorly understood. In metazoans, p31°™"* 
antagonizes the SAC” by functioning as a structural mimic of Mad2 
that binds at the Mad2 dimerization interface to inhibit the conforma- 
tional activation of O-Mad2 (ref. 31). UbcH10, assisted by p31", 
catalyses Cdc20 ubiquitination, which leads to the disassembly of 
the MCC”. 

To understand the molecular mechanisms underlying the mitotic 
checkpoint complex, we determined the crystal structure of the fission 
yeast MCC. The structure shows how Mad2 and Mad3 cooperatively 
inhibit Cdc20, and indicates how p31°°™* would antagonize MCC 
assembly. The structure of Cdc20 in the context of the MCC offers the 
first opportunity to visualize degron recognition by co-activators. The 
interaction between Mad2 and Mad3 positions the Mad3 KEN box 
towards the KEN-box receptor of the Cdc20 WD40 domain. 
Additionally, an unexpected D-box mimic of the Mad3 C terminus 
reveals the D-box-binding site on Cdc20, thus demonstrating the 
structural basis of D-box recognition by co-activators. 


Overall structure of the MCC 

We generated the fission yeast MCC by co-expressing Cdc20, Mad2 
and Mad3 in insect cells. The complex comprises Cdc20 with all 
functional domains (C box, Mad2-binding motif, WD40 domain 
and Ile-Arg tail (Supplementary Fig. 1)), Mad2 locked in its closed 
conformation that facilitates binding to Cdc20 (ref. 31), and Mad3 
truncated after the tetratricopeptide repeat (TPR) domain*® and thus 
lacking its C-terminal KEN box. Bub3 was omitted from the complex 
because previous studies indicated that Bub3 was not an integral part 
of MCC in fission yeast** and was not required for MCC-mediated 
inhibition of human APC/C™. 


1 Division of Structural Biology, Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London, SW3 6JB, UK. 
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The crystal structure of the MCC, at 2.3 A resolution, shows that 
Cdc20, Mad2 and Mad3 assemble into a triangular heterotrimer 
(Fig. 1 and Supplementary Table 1). Mad3 coordinates the overall 
organization of the complex by forming numerous inter-subunit 
interactions with Mad2 and Cdc20, whereas Cdc20 and Mad? interact 
primarily through the sequestering of the Mad2-binding motif of 
Cdc20 by the Mad2 ‘safety belt’ (refs 21, 22). The core architecture 
of Mad3 is a contiguous TPR superhelix with three TPR motifs 
flanked by capping «-helices that resembles the BubR1 amino ter- 
minus**. In addition, not present in the BubR1 structure, the con- 
served N terminus of Mad3 that incorporates the N-terminal KEN 
box that is essential for MCC assembly’*’”** adopts a helix-loop- 
helix (HLH) motif (Fig. 1 and Supplementary Fig. 2). The HLH motif 
simultaneously binds Mad2 and Cdc20, orienting the KEN box 
towards its receptor on Cdc20 (Fig. la, b). Mad3 also contacts 
Mad2 and Cdc20 through its TPR domain (Fig. la, c). 

The WD40 domain of Cdc20 conforms to a canonical seven-bladed 
B-propeller (Fig. 1a). The Mad2-binding motif, N-terminal to the 
WD40 domain, is structurally well defined, engaging the safety 
belt*”? of Mad2 and adopting a conformation that is similar to the 
Mad2-binding motif of Mad1 bound to Mad2 (ref. 22) (Fig. 2a). The 
linker connecting the Mad2-binding motif with the N terminus of the 
WD40 domain is disordered, as are the two APC/C interacting motifs: 
aconserved C box that is immediately N-terminal to the Mad2-binding 
motif, and the C-terminal Ile-Arg tail that interacts with the TPR 
subunit Cdc27*° (Supplementary Fig. 1). 

Mad? is an «/B-HORMA-class protein” (Fig. la and Supplemen- 
tary Fig. 3). Its C-Mad2 conformation enables interactions with the 


domain 


Mad3 


}) Mad3 HLH 


Ic20- 


Safety belt Cd 
MB motif 
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Mad2-binding motifs of Cdc20 and Mad1 and, as revealed in the 
MCC crystal structure, is the only state of Mad2 that can recognize 
Mad3. Mad2 interacts through its a-C helix and $8'-$8’’ hairpin 
with the HLH motif of Mad3 (Fig. 2b). Notably, such Mad2 inter- 
actions resemble those between C-Mad2 and p31°°™*' (ref. 31.) 
(Fig. 2c), and C-Mad2 in an asymmetric C-Mad2-O-Mad2 dimer’ 
(Fig. 2d). The involvement of the B8’-B8’’ hairpin—a region of Mad2 
that undergoes substantial conformational change upon transition 
from the open to closed states*’—at the Mad2-Mad3 interface indi- 
cates that Mad3 binds exclusively to C-Mad2. 

Mutations of the Mad2 «-C helix disrupt C-Mad2-Mad3 inter- 
actions**, consistent with our structure. To test the Mad3 interface, 
we replaced Met 13 with Arg (Fig. 2b). This mutation dissociated 
Mad3 from a Cdc20-C-Mad2 heterodimer when size-exclusion 
chromatography was performed (Supplementary Fig. 4). Together, 
these data confirm the physiological relevance of the C-Mad2-Mad3 
interface and show that C-Mad2 is required to confer high-affinity 
binding of Mad3 to Cdc20, consistent with studies showing that 
Mad3 association with Cdc20 in vivo is synergistic with Mad2 (refs 
12, 15, 16). 


The Cdc20 WD40 domain is a receptor for KEN and D box 


We identified two highly conserved surfaces on the Cdc20 WD40 
domain, and these are responsible for APC/C degron recognition: 
the KEN-box” receptor, situated on the top side of the WD40 domain 
at the centre of the B-propeller (Fig. 3a), and the D-box”* co-receptor 
lying in a channel between blades 1 and 7 (Fig. 4a). The KEN-box 
residues of Mad3 (Lys 20, Glu 21 and Asn 22) emerge from the C 


Figure 1 | Structure of S. pombe MCC trimer. 

a, Cartoon representation of the complex of Mad2 
(green), Mad3 (cyan) and Cdc20 (yellow). Mad2 is 
in the C-Mad2 conformation. The KEN box is 
shown in red, located in the HLH motif of Mad3. 
The D-box mimic (magenta) that is bound to 
Cdc20 is from the C terminus of Mad3 from a 
symmetry-related molecule. The Mad2 B8'-B8"’ 
hairpin (dark green) that forms the Mad2—Mad3 
interface, repositions on conversion from O-Mad2 
to C-Mad2. The N terminus of the WD40 domain 
is indicated. b, Details of the Mad3 HLH 
interaction with Mad2 and Cdc20. ¢, Surface 
representation of the MCC. The interaction 
between Mad2 and Mad3 positions the Mad3 KEN 
box at the KEN-box receptor, which is located at 
the centre of the top side of Cdc20’s WD40 domain. 
MB, Mad2-binding. 
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terminus of the o-A helix of the HLH as an underwound turn of 
a-helix (Figs 1 and 3b). Their conformation resembles a ‘U’, with 
Glu 21 at the turn dipping into a depression at the centre of the 
Cdc20 B-propeller toroid. Tyr 181 of Cdc20, situated on an extended 
loop connecting blade 1 with blade 7, acts as a platform to support the 
aliphatic moiety of the Glu 21 side chain. Apart from this single 
hydrophobic interaction, the KEN box forms entirely polar contacts 
involving its side-chain and main-chain groups with conserved polar 
and charged residues of loops on four blades of the Cdc20 B-propeller. 
Notably, the five residues of Cdc20 that contribute side-chain 
interactions to the KEN box (Asp 180, Asn 326, Thr 368, Gln 392 
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Figure 2 | Details of Cdc20-Mad2-Mad3 
interactions. a, Mad2-binding motif of Cdc20 
bound to the Mad? safety belt. b, A universal Mad2 
dimer interface contacts Mad3, p31°°™“' and 
O-Mad2. Mad2-Mad3 interactions. c, Mad2- 
p31" (ref. 31) (Protein Data Bank (PDB) 
accession 2QYF). d, C-Mad2-O-Mad? (ref. 37) 
(PDB accession 2V64). 


Mad3 HLH 


and Arg 438) are invariant in both Cdc20 and Cdhl, indicating a 
universal mode of co-activator-KEN box interaction (Supplemen- 
tary Fig. 5). A proline (Pro 25) positioned three residues C-terminal 
to the Mad3 KEN box—well conserved in KEN-box motifs—acts to 
break the «-helix and orient the polypeptide chain away from the 
Cdc20 surface. 

The Mad3 KEN box has been proposed to function as a pseudo- 
substrate inhibitor, blocking access of Cdc20 to KEN-box degrons in 
APC/C substrates’*. Thus, the mode of binding of the Mad3 KEN box 
to Cdc20 should serve as a model for understanding how co-activators 
recognize APC/C substrates. Consistent with this idea, mutating the 


Figure 3 | The KEN box binds to a conserved 
surface at the centre of the top side of the WD40 
domain. a, Cdc20 shown with surface 
representation coloured according to the sequence 
conservation of Cdc20 and Cdh1 (yellow 
(invariant) to green (less conserved)) 
(Supplementary Fig. 5). Mad3 diagram with KEN 
box in red. b, Details of KEN-box interactions with 
the Cdc20 WD40 domain. c, Mutation of the KEN- 
box-binding site abolishes S. cerevisiae APC/ Com 
mediated ubiquitination of securin as indicated by 
an in vitro transcription or translation (IVT)-based 
ubiquitination assay. KEN-box mutants: N405A, 
N407A, Q473A, RSI7L (S. pombe Cdc20: N324, 
N326, Q392, R438). 
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Figure 4 | The D box binds in an extended 
conformation to a conserved inter-blade channel 
on the Cdc20 WD40 domain. a, Cdc20 shown 
with surface representation coloured according to 
sequence conservation of Cdc20 and Cdh1 (yellow 
(invariant) to green (less conserved)). 
(Supplementary Fig. 5). D-box mimic (C terminus 
of Mad3) is shown in purple. b, D-box mimic 
bound through a peptide-in-channel mechanism 
to Cdc20. Lys 92 of Mad3 contacts the negatively 
charged D-box Arg-binding site of Cdc20, 
sterically hindering the D-box receptor of Cdc20. 
c, Negative electrostatic potential of Cdc20 in the 
vicinity of the Arg-binding site of modelled D box. 
d, Model of a KEN-D peptide bound to KEN and 
D-box sites of Cdc20. e, Mutation of the D-box- 
binding site abolishes S. cerevisiae APC/C“™!- 
mediated ubiquitination of securin as indicated by 
an IVT-based ubiquitination assay. D-box Arg- 
binding-site mutants: D256S, D536S, E5378 (S. 
pombe Cdc20: D173, D457, E458). Mutated S. 

-™ cerevisiae Cdh1 residues are indicated in black with 
equivalent residues of S. pombe in red. At the D-box 
Leu-binding site, mutation of S. pombe Cdc20 Val 
196 had the most notable effect, with little loss of 
activity for Leu460Met. Mutation of Arg 170, 
located distal to the Arg and Leu sites (Fig. 4c), did 
not affect APC/C©*"! activity. f, Ubiquitination 
competition assay. The KEN-D box peptide 
inhibits S. cerevisiae APC/C“*"'-mediated securin 
ubiquitination fivefold more efficiently than does 
the D-box-KEN-box peptide (compare lanes 5 and 
10). Blocking the Leu-binding pocket by small 
molecules ligands may be a potential mechanism to 


Ubiquitinated 
securin 


Securin 


Lane 


equivalents of the Cdc20 residues Asn 326, Thr 368, Gln 392 and 
Arg 438 in Saccharomyces cerevisiae Cdh1 abolished APC/C“*"'-cat- 
alysed ubiquitination of the KEN-box and D-box-dependent APC/C 
substrates securin and Clb2 (Fig. 3c and Supplementary Fig. 6). Co- 
migration of APC/C and Cdh1 on a native gel confirmed that the 
inactivation was not due to misfolding of the mutant co-activator 
(Supplementary Fig. 7). Our findings that the KEN box of Mad3 is 
embedded within a segment of ordered structure, possibly stabilized 
on the Mad3 HLH motif through contacts to Mad2 (Fig. 1), suggest 
that degrons do not need to be disordered to allow substrate recog- 
nition by ubiquitin ligases. 

The WD40 domain of co-activators interacts with the D-box 
degron (RXXLXX(I/L)(S/T)N)”*”?, acting as a co-receptor with the 
APC/C subunit Apc10 (refs 40, 41). However, the molecular details of 
co-activator-D box interactions have not been established. In the 
crystal structure of the MCC, an unexpected crystal packing contact, 
in which the C terminus of a neighbouring Mad3 subunit partially 
mimics a D box and contacts Cdc20, provides detailed molecular 
insights of co-activator-D box recognition. The D box mimic of 
Mad3 was found in an extended conformation engaged within a 
conserved channel located between blades 1 and 7 on the rim of the 
B-propeller toroid (Fig. 4b and Supplementary Fig. 8). The interaction 
is dominated by the burial of the aliphatic side chain of Leu 215 of 
Mad3, mimicking the Leu of the D-box RXXL motif, within a deep 
pocket perfectly matching a Leu side chain, and created by non-polar 
residues invariant in Cdc20 and Cdh1 (Fig. 4a and Supplementary 
Fig. 5). Other consensus residues of the D box—the essential and 


inhibit APC/C-co-activator activity. 
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invariant Arg and less-well-conserved (I/L)(S/T)N motif*’—are not 
represented in the Mad3 D-box mimic. However, a candidate recog- 
nition site for the Arg side chain of the D box can be assigned to a 
cluster of negatively charged residues (Asp 173, Asp 457 and Glu 458) 
located on two adjacent f-strands of blade 7 on the top side of the 
D-box-binding channel. These residues are suitably positioned to 
interact with an Arg residue that is N-terminal to the anchored Leu 
residue, and are conserved in Cdh] (Fig. 4b, cand Supplementary Fig. 5). 

To test whether the putative D-box-binding site that was identified 
on Cdc20 is responsible for recognizing the D box of APC/C substrates, 
we generated mutants to abolish individually the equivalent Arg- and 
Leu-binding sites in Cdh1. Disrupting the Leu-binding pocket by repla- 
cing Val 196 with a bulky methionine, and substituting Ser for the Cdh1 
equivalents of Asp 173, Asp 457 and Glu 458 at the putative Arg- 
binding site, eliminated the ability of Cdh1 to stimulate APC/C activity 
(Fig. 4e and Supplementary Fig. 6) but had no affect on co-activator 
binding to the APC/C (Supplementary Fig. 7). These results are con- 
sistent with the inter-blade channel of Cdc20 functioning as the D-box- 
binding site. The region of this channel that probably contacts the 
D-box (I/L)(S/T)N motif is less well conserved between Cdc20 and 
Cdh1 (Fig. 4a), possibly explaining the differential affinity of the two 
co-activators for D-box degrons. 


Substrate and inhibitor recognition by co-activators 


To characterize further the APC/C degron-binding sites on co- 
activators, we designed peptides that incorporate both KEN-box 
and D-box motifs and tested their ability to inhibit APC/C@"!. 
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Examination of the Cdc20 structure suggested that a peptide with 
KEN and D boxes linked by 17 residues (KEN-D peptide) would 
enable cooperative binding of both degrons to their respective binding 
sites, thereby conferring higher-affinity binding than peptides with 
either degron alone (Fig. 4d). In contrast, a peptide with the reverse 
orientation of KEN and D boxes (D-KEN peptide), also connected by 
a 17-residue linker, would only permit a single degron to bind. When 
we compared the potential of these peptides to competitively inhibit 
securin ubiquitination by APC/CS hr we found that the KEN-D 
peptide was five times more potent as an inhibitor than the D-KEN 
peptide (Fig. 4f). Because D-KEN peptide inhibited APC/C“™ with 
the same efficiency as a D-box peptide (Supplementary Fig. 9a), these 
results indicate that cooperative degron binding to Cdhl was con- 
ferred by the specific spatial arrangement of the KEN and D boxes of 
the KEN-D peptide, in agreement with our assignment of the KEN- 
and D-box-binding sites on co-activators. 

The spatial arrangement of KEN and D boxes in the KEN-D 
inhibitory peptide is markedly similar to the organization of KEN 
and D boxes in two APC/C inhibitors: Acm1 and Mes] (refs 43-45) 
(Supplementary Fig. 10). Both proteins inhibit the APC/C co-activators 
through a pseudosubstrate-based mechanism that is dependent on 
their D and KEN boxes, and it seems reasonable to assume that the 
spacing of 18 and 24 residues between the KEN and D box degrons of 
Acm1 and Mes1, respectively, optimizes inhibitor-co-activator affinity. 
Here we show one mechanism by which a KEN- and D-box-containing 
protein would bind co-activators. However, because of the diverse 
configuration and relative separation of KEN- and D-box motifs in 
APC/C substrates, several modes of co-activator-substrate recognition 
probably exist. Our assignment of the D-box recognition site to an 
inter-blade channel of the co-activator B-propeller was recently con- 
firmed by a Cdhl-Acm1 crystal structure (W.C.H.C., D.B. and J. He, 
unpublished observations). 

KEN- and D-box motifs adopt different conformations when 
bound to Cdc20; the KEN box is an underwound helix (Fig. 3b), 
whereas the D box assumes an extended structure (Fig. 4b). This 
important distinction means that the three residues of the KEN motif 
are presented to the same surface of Cdc20, whereas in the D box 
alternative amino acid side chains are oriented in opposite directions. 
Thus, with the Leu side chain of the D box anchored by the co- 
activator, conserved Arg, Ile/Leu and Asn side chains at positions 1, 
7 and 9 of the D box”, respectively, would be accessible to generate a 
composite D-box-co-activator recognition surface for the D-box co- 
receptor Apcl0 (refs 40, 41). This is consistent with the identification 
of the D-box co-receptor at the interface of co-activator and Apc10 in 
an APC/C“"P © ternary complex“. 


Implications for MCC-mediated inhibition of the APC/C 


The processes underlying MCC-mediated inhibition of APC/C are 
incompletely defined, and probably involve several mechanisms. 
Mad3, which is dependent on its N-terminal KEN box, blocks 
Cdc20-mediated substrate recognition”’, consistent with our structure 
showing that the Mad3 KEN box binds to the KEN-box recognition site 
of Cdc20. Notably, the same KEN box also promotes Mad3-dependent 
APC/C-mediated degradation of Cdc20 (refs 18, 27, 29), which sug- 
gests that Mad3 has a role in positioning Cdc20 for ubiquitination by 
the APC/C’s catalytic centre. To understand this function of Mad3, 
we docked our MCC coordinates into the electron-microscope- 
derived molecular envelope of the APC/C-MCC complex" (Fig. 5). 
Interpretation of this structure was based on our previous sub- 
unit assignment and pseudo-atomic model of budding yeast APC/ 
COAED ox (ref, 46). The MCC crystal structure corresponds closely 
to the assigned MCC density of APC/CM°, with Cdc20, Mad2 and 
Mad3 clearly recognizable. Mad2 contacts the TPR subunits Cdc23 
and Apc5, whereas Mad3 interacts with Apcl. There is insufficient 
unassigned density to account for the C-terminal kinase domain of 
BubRI1 and Bub3, suggesting their structural disorder. Notably, the 
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Figure 5 | Pseudo-atomic structure of human APC/C’. a, Docking of 
MCC coordinates into a human APC/C™ electron microscope map". APC/ 
C subunit coordinates are based on the pseudo-atomic structure of S. cerevisiae 
APC/C*. In the context of the APC/CM, Cdc20 is in a downwards position 
relative to APC/C“*"! (ref. 41). b, Details of MCC interactions with APC/C 
showing the position of the D-box co-receptor on Cdc20 relative to Apcl0 and 
the catalytic centre of Apc2 and Apcll. The downwards position of Cdc20 
prevents interaction with Apcl0. Surface representation of Cdc20 is colour- 
coded from yellow (invariant) to green (less conserved). 


KEN-box-binding site of Cdc20 is blocked by Mad3, whereas the 
D-box site is directed towards, but not in contact with, its co-receptor 
Apcl0 (Fig. 5b). Compared with APC/C“"!? > in the APC/CMS 
complex, Cdc20 is displaced downwards towards Apc5. This position 
may facilitate Cdc20 ubiquitination. Furthermore, the lower position of 
co-activator prevents its D-box-binding site from generating a bipartite 
D-box co-receptor with Apcl0 (ref. 41). This, together with a partial 
steric blockade of the arginine site of the D-box co-receptor of Cdc20 
by Lys 92 of the Mad3 TPR domain (Fig. 4b), explains the inability of 
APC/CM® to recognize and ubiquitinate D-box-dependent sub- 
strates’, In APC/C“S, a region of the Cdc20 D-box-binding site for 
the Leu anchor and C-terminal residues, is accessible (Fig. 5b), suggest- 
ing that a non-consensus D-box sequence (that is, lacking an Arg residue 
and not dependent on the co-receptor Apcl0) could engage this site. 

In addition to the interactions of Mad2 and Mad3 with APC/C 
subunits, the position and activity of Cdc20 in the context of APC/ 
CMC© might also be influenced by the sequestration of its N-terminal 
APCI/C recognition motifs by Mad2. The Mad2-binding motif of 
Cdc20 also mediates APC/C interactions*’, which would be blocked 
by the Mad2 safety belt (Fig. 5b). Furthermore, constraining the 
Cdc20 N terminus by Mad2 might also prevent the neighbouring C 
box from accessing its APC/C-binding site, which is necessary for 
Cdc20-dependent stimulation of APC/C catalytic activity®. 

The SAC is antagonized by p31°°™ (refs 30,49), and the structure 
of the C-Mad2-p31°°™* complex showed that p31°°™ binds at the 
Mad2 dimerization interface to inhibit the conformational activation 
of O-Mad2 (ref. 31). However, it is clear from our MCC structure that 
because Mad3 and p31°™* bind to a common Mad2 interface, 
p31°°™*' would compete for Mad3 interactions with Mad2 (Fig. 2c), 
explaining how p31°™" both antagonizes the assembly of the MCC*! 
and promotes its disassembly”. 

Here we show how the molecular basis for APC/C inhibition by its 
regulators is coupled to degron recognition by co-activators. APC/C 
activity is modulated by sterically blocking substrate recognition and 
through conformational changes that disrupt the substrate-binding 
site, reminiscent of modes of protein kinase regulation. This contrasts 
with the SCF complex whose activity is regulated at the level of sub- 
strate recognition through degron phosphorylation. 


METHODS SUMMARY 


Expression, purification and crystallization of S. pombe MCC. The S. pombe 
MCC (involving Cdc20, Mad2 and Mad3) was generated by co-expressing 
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Cdc20-Slp1 (residues 87-488), Mad2 (with Leul2Ala and Arg133Ala mutations) 
and Mad3 (residues 1-223) in the baculovirus and insect cell systems. 
Crystallization, crystal structure determination, mutagenesis, enzyme assays 
and other procedures were performed as described in the Methods. Diffraction 
data were collected at beamline 102 at the Diamond Light Source. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cloning, expression and _ purification of the MCC _ complex. 
Schizosaccharomyces pombe CDC20-SLP1 (residues 87-488), MAD2 (residues 
1-203) and MAD3 (residues 1-223) genes were amplified by polymerase chain 
reaction (PCR) using an S. pombe complementary DNA library pTN-TH7 (a gift 
from T. Nakamura) as a template and cloned into a modified pFBDM vector"! 
(Z.Z. & D.B., unpublished observations). A double Strep-tag II (ds) together with 
a tobacco etch virus (TEV) cleavage site were introduced into the N terminus of 
Cdc20-Slp1 and mutations of Leul2Ala and Arg133Ala were introduced into 
Mad? (ref. 31). A previous study showed that the Arg133Ala mutant was fully 
capable of inducing a mitotic arrest**. The resultant protein expression cassettes 
were recombined with the DH10MultiBac cells to create a bacmid by transposi- 
tion®!. The ds-Cdc20-Mad2-Mad3 complex was expressed using the baculovirus 
and insect cell (High 5 cells) systems, and purified by a combination of Strep- 
Tactin (Qiagen), anion exchange chromatography Resource Q and Superdex 200 
size-exclusion chromatography (GE Healthcare). The identities of the purified 
complex subunits were confirmed by antibody labelling and N-terminal sequen- 
cing. Mutagenesis was performed to generate the Mad2—Mad3 interface mutant, 
Cdc20-Mad2(A137Y)-Mad3(M13R). 

Crystallization, data collection and structure determination. Crystals were 
obtained in a hanging-drop fashion by pre-incubating 1:1 v/v of 4.5 mg ml * 
of protein with the crystallization solution, 100 mM Tris-HCl pH 8.8, 21% PEG 
3350, 30% ethylene glycol, 5 mM dithiothreitol (DTT), and 5 mM EDTA at 20 °C 
for 24 h, followed by streak seeding. Crystals grew to full size after 2 weeks and 
were mounted in 0.2-0.3 mm cryoloops and frozen in liquid nitrogen. Native 
crystals diffracted to a minimum Bragg spacing (din) of about 2.1 A. The dif- 
fraction data set was collected at the 102 beamline of Diamond Light Source from 
a single crystal, processed with XDS and scaled to 2.3 A with SCALA®, Phase 
information was obtained by molecular replacement with AMoRe™. Monomeric 
coordinates from the crystal structures of human MAD2(L13A) (PDB 2VFX)”*, 
human BUBR1 (PDB 2WVI)* and human WDRS5 (PDB 3EMH)” were used as 
search models. All three trimeric complexes (MCC complexes) that are present in 
the asymmetric unit were assembled with repeated runs of AMoRe. Iterative 
model building and refinements were carried out with COOT” and PHENIX*, 
respectively. Simulated annealing composite omit maps were systematically cal- 
culated at several stages of model building and refinement, and examined to 
minimize the effects of model bias. TLS parameters that were generated from 
the TLS motion determination server” (http://skuld.bmsc.washington.edu/ 
~tlsmd/) were used throughout the refinement. Water molecules were added 
towards the end of the refinement. The structure was validated with MolProbity”. 
Data collection and refinement statistics are given in Supplementary Table 1. 
APC/C ubiquitination assays with wild-type and mutant Cdh1. Saccharomyces 
cerevisiae Cdh1 mutants were generated using PCR-based mutagenesis and 
cloned into a linearized pRSET vector. A functional T7 promoter and the Cdh1 
sequences were confirmed by DNA sequencing. APC/C ubiquitination assays 
were adopted and modified from ref. 61. *°S-labelled Clb2p and securin 
(Pdslp) and unlabelled Cdh1 mutants were prepared using TNT T7 Quick- 
coupled in vitro transcription (or translation) (IVT; Promega). Each ubiquitina- 
tion reaction contains approximately 10 ng of S. cerevisiae APC/C, 1 pl of 
*5S-labelled substrate and 2 pl of Cdh1 in a 10-l reaction volume with 40 mM 
Tris-HCl pH 7.5, 10 mM MgCl, 0.6 mM DTT, 2.7 mM ATP, 6.6 pg of methyl 
ubiquitin, 500 ng of Ubc4, 200 ng of ubiquitin aldehyde (Enzo Life Science), 2 4M 


LLnL (N-acetyl-Leu-Leu-Norleu-aldehyde) (Sigma). Reactions were incubated at 
room temperature for 15 min and were analysed using 8% SDS-PAGE. Gels were 
fixed and stained with Coomassie blue, then dried and exposed to BioMax MR 
Film (Kodak). 

Native gel electrophoresis. Correct folding of S. cerevisiae Cdh1 mutants was 
assessed by their co-migration with the APC/C in native gel electrophoresis. 50 ng 
of apoAPC/C was mixed with 2 il of *°S-labelled IVT-produced Cdh1 and 0.7 ul 
of 100 mM CaCl, in a volume of 14 pl with 10 mM Tris pH 8.0, 150 mM NaCl, 3 
mM DTT, 1 mM magnesium acetate and 2 mM EGTA. Samples were incubated 
at room temperature for 15 min before adding 1 ul of native gel loading buffer 
(125 mM Tris pH 8.0, 84% (v/v) glycerol) to each reaction. The entire reaction 
was loaded onto a 5.25% non-denaturing polyacrylamide gel run at 4 °C, 110 V 
for 2 h. Gels were fixed and stained with Coomassie blue, then dried and exposed 
to film. 

KEN-D peptide competition assays. The KEN-D (Ac-NKENEGPASGASGA 
SGASGAQRAALSDITNS-NH2), D-KEN (Ac-QRAALSDITNSGASGASGASG 
ASNKENEGPA-NH2), KEN-D™" (Ac-NKENEGPASGASGASGASGAQSAA 
ASDITSS-NH2) and KEN™"'-D (Ac-NSASEGPASGASGASGASGAQRAALSD 
ITNS-NH2) peptides were designed with a 17-residue linker between the KEN- 
box sequence (NKENEGPA) and the D-box sequence (QRAALSDITNS), or 
between their respective mutant sequences (NSASEGPA and QSAAASDITSS). 
The peptides were synthesized by Cambridge Peptides. APC/C ubiquitination 
reactions were performed in the same way as described above using the co- 
purified APC/C*"' complex". Peptides were added into the reaction to a final 
concentration ranging from 5 LM to 1000 iM. Reactions were incubated at room 
temperature for 15 min and were analysed using 8% SDS-PAGE. Gels were fixed 
and stained with Coomassie blue, then dried and exposed to film. 

Fitting structure coordinates into the human APC/C™“ EM map. The struc- 
ture coordinates of S. pombe Cdc16-Cdc26 complex (PDB 2XPI), Cdc27 (PDB 
3KAE), Apcl0 (PDB 1GQP), modelled Apc2 and MCC were fitted into the 
human APC/CM®S map (EMD-1591) based on the S. cerevisiae APC/C assign- 
ment**, using UCSF Chimera. 
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XingWang Deng”* & Yigong Shi! 


The Arabidopsis thaliana protein UVR8 is a photoreceptor for ultraviolet-B. Upon ultraviolet-B irradiation, UVR8 
undergoes an immediate switch from homodimer to monomer, which triggers a signalling pathway for ultraviolet 
protection. The mechanism by which UVR8 senses ultraviolet-B remains largely unknown. Here we report the crystal 
structure of UVR8 at 1.8 A resolution, revealing asymmetric homodimer of seven-bladed f-propeller that is devoid of any 
external cofactor as the chromophore. Arginine residues that stabilize the homodimeric interface, principally Arg 286 and 
Arg 338, make elaborate intramolecular cation-n interactions with surrounding tryptophan amino acids. Two of these 
tryptophans, Trp 285 and Trp 233, collectively serve as the ultraviolet-B chromophore. Our structural and biochemical 
analyses identify the molecular mechanism for UVR8-mediated ultraviolet-B perception, in which ultraviolet-B radiation 
results in destabilization of the intramolecular cation-7n interactions, causing disruption of the critical intermolecular 
hydrogen bonds mediated by Arg 286 and Arg 338 and subsequent dissociation of the UVR8 homodimer. 


Perception of light is important for all kingdoms of life’. Light regulates 
the circadian clock in worms and social activity in fruitflies”’. In plants, 
light is a major source of energy and regulates all major develop- 
mental and physiological processes**. A wide wavelength range of 
light is sensed by specific families of photoreceptors: phytochrome for 
red and far red*; phototropin and cryptochrome for ultraviolet-A 
and blue”"'; and UVR8 for ultraviolet-B (wavelength range 280- 
315 nm)*”’. Except for UVR, all other photoreceptors contain specific 
external cofactors as chromophores: bilin for phytochrome; FAD and 
MTHF for cryptochrome; and FMN for phototropin*’*. It remains 
unclear whether UVR8 contains any external cofactor for ultraviolet-B 
perception. 

UVRS§, originally identified as a regulatory protein for ultraviolet- 
B-triggered signal transduction™, was recently shown to be a receptor 
for ultraviolet-B’*. Ultraviolet-B perception was thought to induce 
dissociation of the UVR8 homodimer, allowing its subsequent inter- 
action with COP1 and transcriptional activation of ultraviolet-B- 
responsive genes'*!*-” (Fig. 1a). A number of tryptophan residues, 
particularly Trp 285, were shown to have an important role in 
ultraviolet-B-triggered signalling’*. Despite these advances, it remains 
unknown how ultraviolet-B is sensed by UVR8 or how ultraviolet-B 
perception leads to dissociation of the UVR8 homodimers. 


Biochemical characterization of UVR8 

The full-length, wild-type UVR8 and two variants, W285F and 
W285A, were purified to homogeneity. As reported’’, both wild type 
and W285F existed mainly as homodimers on SDS-polyacrylamide 
gel electrophoresis (SDS-PAGE) in the absence of heating (Fig. 1b, 
lanes 1-2). However, only wild type, not W285F, was able to undergo 
ultraviolet-B-induced monomerization (lanes 7-8). By contrast, the 
variant W285A appeared only as a monomer both before and after 
ultraviolet-B irradiation (lanes 3 and 9). Heating at 96 °C in the presence 
of SDS reduced all UVR8 homodimers to a monomeric state (lanes 4-6). 
These results are in agreement with published observations”. 


The ionic detergent SDS is a protein denaturant. Homodimer 
formation of wild-type UVR8, however, is remarkably stable and 
resists treatment by up to 12% SDS in the absence of heating 
(Supplementary Fig. 1a). Because the SDS sample buffer contains 
200 mM dithiothreitol (DTT), we speculated that two molecules of 
UVRS8 are held together through a covalent linkage—such as a 
disulphide bond—that is susceptible to heating in the presence of 
DTT or ultraviolet-B irradiation. Quite unexpectedly, elevated ionic 
strength led to efficient conversion of the UVR8 homodimers to a 
monomeric state (Supplementary Fig. 1b). This observation strongly 
suggests that the forces that hold together two UVR8 molecules are 
ionic in nature. 

Use of SDS-PAGE may not allow faithful evaluation of native 
protein conformation. To alleviate this potential problem, we used 
the more sensitive method of gel filtration to examine the oligomeric 
state of UVR8 under physiological pH and ionic strength (Fig. 1c). 
Before ultraviolet-B irradiation, wild-type UVR8 was eluted from gel 
filtration with an apparent molecular mass of approximately 150 kDa, 
larger than that calculated for a UVR8 homodimer (~100 kDa). This 
discrepancy is probably caused by the extended flexible sequences at 
the amino and carboxy termini of UVRS8. After ultraviolet-B irra- 
diation, the elution volume for wild-type UVR8 corresponded to a 
molecular mass of about 75 kDa (Fig. 1c). This observation confirms 
the reported, ultraviolet-B-induced dimer-to-monomer switch’*. As 
anticipated, the variant UVR8(W285F) appeared as a homodimer 
irrespective of ultraviolet-B radiation (Fig. 1c). However, unlike the 
SDS-PAGE result’? (Fig. 1b), UVR8(W285A) existed mainly as a 
homodimer both with and without ultraviolet-B treatment (Fig. 1c). 
Ultraviolet-B irradiation seemed to have weakened formation of the 
W285A homodimer as judged by the presence of a small fraction of 
monomers (Fig. Ic). 

The WD40 repeats of UVR8 are thought to be responsible for 
ultraviolet-B perception’*. Supporting this notion, the protease- 
resistant core domain of UVR8 (residues 12-381; Supplementary 
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UVRS8. a, A schematic diagram of ultraviolet-B-induced, UVR8-mediated 
signalling cascade. Upon receiving ultraviolet-B (UV-B) radiation, the UVR8 
homodimer dissociates into monomers. The UVR8 monomer then associates 
with COP1, ultimately resulting in the activation of ultraviolet-B-responsive 
genes. b, The wild-type (WT) UVR8 undergoes a dimer-to-monomer switch in 
response to ultraviolet-B radiation as judged by SDS-PAGE. c, Solution 
behaviour of UVR8 in response to ultraviolet-B radiation. Shown here are gel 
filtration chromatograms. Only the wild-type UVR8, but not the variants 
W285F or W285A, switched from a homodimer to a monomer in response to 
ultraviolet-B radiation. d, Crystals of the UVR8 core domain (residues 12-381) 
were cracked by ultraviolet-B radiation. e, Overall structure of the UVR8 core 
domain. All structural figures were prepared with PyYMOL”. 


Fig. 1c) retained the same ability as the full-length protein to undergo 
an ultraviolet-B-induced, dimer-to-monomer switch (Supplementary 
Fig. 1d). To elucidate the mechanism of ultraviolet-B perception by 
UVR8, we crystallized its core domain. Intriguingly, ultraviolet-B 
irradiation resulted in cracking of these crystals (Fig. 1d), indicating 
that the UVR8 core domain retained the ability to sense ultraviolet-B 
in the crystals. We also crystallized the core domain variants W285F 
and W285A. By contrast, the W285F and W285A crystals failed to 
crack even after prolonged ultraviolet-B irradiation (Supplementary 
Fig. 2), consistent with loss of ultraviolet-B responsiveness”* (Fig. 1c). 


Overall structure of UVR8 

The crystal structure of the UVR8 core domain (residues 12-381), 
which represents 84% of the full-length UVR8 protein, was deter- 
mined by selenium-based, single-wavelength anomalous dispersion 
(SAD). The atomic models of the UVR8 core domain and its variants 
W285F and W285A were refined at resolutions of 1.8, 2.0 and 1.8 A, 
respectively (Supplementary Table 1 and Supplementary Fig. 3). The 
UVRS8 core domain forms a seven-bladed -propeller (Fig. le). In 
contrast to all previously determined structures of photoreceptor”, 
UVRS8 does not contain any external cofactor as the chromophore. 
Unlike canonical WD-40 repeats'’, each blade in UVR8 comprises 
only three B-strands, termed A, B and C’’ (Supplementary Fig. 4a). An 
extended loop, designated loop CD, follows strand C in each blade. By 
convention’, loops AB and CD reside on the bottom face of the 
B-propeller whereas the BC loop is located on the top face (Fig. le). 
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A prominent sequence motif, GWRHT, is present in the AB loops of 
blades 5-7 (Supplementary Fig. 4a). The seven blades in UVR8 have a 
nearly identical main-chain conformation, which is similar to that of the 
cell cycle regulatory protein RCC1 (ref. 20; Supplementary Fig. 4b, c). 

In the crystals, two molecules of the UVR8 core domain form a 
homodimer. Two surface patches of complementary charges are 
located on the bottom face of each core domain (Fig. 2a). The acidic 
surface patch comprises five negatively charged amino acids, Glu 43, 
Asp 44 and Glu 53 from blade 1 and Asp 96 and Asp 107 from blade 2. 
The basic patch contains four positively charged residues, Lys 252 
from blade 5, Arg 286 from blade 6, and Arg 338 and Arg 354 from 
blade 7. The acidic and basic surface patches from one UVR8 core 
domain associate with the complementarily charged surface patches 
of another core domain to form a symmetric homodimer (Fig. 2b and 
Supplementary Fig. 5). This homodimeric interface, involving 2,566 A* 
buried surface area, is mediated by 32 intermolecular hydrogen bonds. 

Charged amino acids in the two prominent surface patches contri- 
bute a total of 20 intermolecular hydrogen bonds at the homodimeric 
interface (Fig. 2c). In blade 1 of one UVR8 molecule, Glu 43, Asp 44 
and Glu 53 accept four charge-stabilized hydrogen bonds from Arg 338 
and Arg 354 in blade 7 of the other UVR8 molecule. Arg 354 also 
donates a hydrogen bond to the carbonyl oxygen of residue 52 in 
blade 1. In blade 2, Asp 96 and Asp 107 mediate four intermolecular 
hydrogen bonds from Arg 286 of blade 6, whereas Ser 106 receives an 
intermolecular hydrogen bond from Lys 252 of blade 5. In addition to 
the two prominent surface patches, residues in blade 3 of one UVR8 
molecule associate symmetrically with amino acids in blades 4 and 3 
from the other UVR8 molecule, contributing 12 additional inter- 
molecular hydrogen bonds (Fig. 2c, bottom right). 

All 32 intermolecular hydrogen bonds involve side chains, of which 
28 are mediated exclusively by side chains. Apart from two contacts 
between Gln 148 and Asn 149 (Fig. 2c, bottom right), all 30 other 
hydrogen bonds rely on charged amino acids and 24 are made 
between residues of opposite charges. Notably, Arg 286 of blade 4 
contributes eight intermolecular hydrogen bonds (four from each 
molecule) at the homodimeric interface. This analysis explains the 
finding that homodimerization of UVR8 is disrupted by elevated ionic 
strength (Supplementary Fig. 1b). 


Intramolecular cation-7n interactions 


The overall structures of the UVR8 variants W285F and W285A are 
nearly identical to that of the core domain, with a pair-wise root- 
mean-squared deviation of less than 0.5 A for all aligned Ca atoms 
(Fig. 3a, left). Analysis of the local structural features surrounding 
residue 285 reveals no significant conformational changes between 
the core domain and the variant W285F (Fig. 3a and Supplementary 
Fig. 6). The aromatic side chain of Phe 285 in W285F occupies the 
same position, with a similarly planar orientation, as Trp 285 in the 
core domain. By sharp contrast, major changes occur to three critical 
residues in the variant W285A (Fig. 3a and Supplementary Fig. 6). 
Compared to wild-type UVRS8, the indole ring of Trp 337 swings 
approximately 4 A to occupy the vacated space due to replacement 
of Trp 285 by Ala. Consequently, the indole ring of Trp 233 is rotated 
180° around the CB-Cy axis, which in turn drives a 3.8- A movement 
by the carboxylate group of Asp 129. 

Cation-1 interactions, known to stabilize protein structure”’, make 
considerably more contribution to free energy terms than simple ionic 
interactions”. Arginine and tryptophan are the most preferred 
amino acids for cation—n interactions”, with strong interactions for 
the asians range of 3.4-4.5 A between the cation and the aromatic 
ring’. A detailed analysis of the UVR8 structure revealed an extensive 
network of intramolecular cation-7 interactions surrounding Trp 285 
and Arg 286 in blade 6 (Fig. 3b). 

At the centre of the network, Arg 286 is surrounded by four 
aromatic amino acids (Fig. 3b). The guanidinium group of Arg 286 
is positioned at approximately 3.8 and 4.2 A away from the indole 
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Figure 2 | Structural basis of UVR8 homodimer formation. a, The bottom 
face of the UVR8 propeller contains two surface patches of complementary 
charges. The left and right panels depict the cartoon and electrostatic surface 
representations of the UVR8 core domain, respectively. Five acidic amino acids 
(coloured red) from blades 1 and 2 constitute the negatively charged patch, 
whereas four basic residues (coloured blue) form the positively charged patch. 


rings of Trp 302 and Trp 285, respectively, allowing cation-n inter- 
actions of maximal strength”. Arg 286 also interacts with the phenyl 
ring of Tyr 253 and weakly associates with Trp 250. In addition, the 
indole ring of Trp 285 associates with the guanidinium group of 
Arg 338 through strong cation—z interactions, whereas the indole ring 
of Trp 285 binds to Trp 233 and Trp 337 through m-n stacking inter- 
actions” (Fig. 3b). The essence of n-1 stacking is the cation-n inter- 
action, as the edge of the tryptophan indole ring (around the amine 
group) carries net positive charges”’. Furthermore, Trp 233 appears to 
have an important role in this network by making m-7 stacking inter- 
actions with Trp 337 and strong cation—n interactions with Arg 234 
(Fig. 3b, c). It is unusual to have such a high density of cation-1 
interactions within a protein structure’! (Fig. 3c). 


Ultraviolet-B chromophore identification 
Because UVR8 contains no external cofactor, the chromophore for 
ultraviolet-B perception must be amino acid(s). Among the 20 naturally 
occurring amino acids, only tryptophan and tyrosine, with maximal 
absorption wavelengths of 280 and 275 nm, respectively, are potentially 
capable of perceiving ultraviolet-B (280-315nm). The UVR8 core 
domain contains thirteen tryptophan and eight tyrosine residues. Six 
of the thirteen tryptophans are located in the hydrophobic core and 
away from the homodimeric interface (Supplementary Fig. 7), making 
them unlikely candidates for ultraviolet-B perception. Among the eight 
tyrosine residues, only Tyr 253 is located close to the homodimeric 
interface (Fig. 3b). This analysis suggests that the chromophore for 
ultraviolet-B is among the seven tryptophan residues at the homodi- 
meric interface. 

Ultraviolet-B perception is probably coupled with chemical and/or 
conformational changes around the chromophore, presumably lead- 
ing to alteration of its fluorescence emission spectra. Mutation of the 
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b, Formation of UVR8 homodimer is mediated by charge-stabilized hydrogen 
bonds, mainly from the two surface patches of complementary charges. The 
two UVR8 molecules are related to those in panel a by 90° rotations, as 
indicated in the figure. c, A close-up view on the intermolecular hydrogen 
bonds at the homodimeric interface. Hydrogen bonds are represented by red 
dashed lines. 


key chromophore should result in abrogation of its ability to sense 
ultraviolet-B and consequent loss of fluorescence changes that are 
characteristic of ultraviolet-B perception. To identify conclusively 
the chromophore, we individually mutated the seven tryptophan 
residues to phenylalanine. 

First, we examined the intrinsic tryptophan fluorescence emission 
spectra for wild-type UVR8. In the absence of prior ultraviolet-B treat- 
ment, wild-type UVR8 was continuously monitored at an emission 
wavelength of 335 nm, with an ultraviolet-B excitation wavelength of 
295 nm. Consistent with saturable ultraviolet-B perception, the fluor- 
escence signal increased rapidly over the initial 200s and reached a 
maximum after 400s (Fig. 4a, top). Prolonged irradiation, however, 
resulted in a gradual decrease of the fluorescence signal (Supplemen- 
tary Fig. 8a), presumably due to fluorescence quenching. With prior 
ultraviolet-B treatment, the fluorescence signal of wild-type UVR8 
decreased very slowly over time (Fig. 4a, top), again due to fluorescence 
quenching. By sharp contrast, the UVR8 variants W285F and W285A 
completely lost the ability to perceive ultraviolet-B (Fig. 4a, bottom; 
Supplementary Fig. 8b). These observations identify Trp 285 as an 
essential component of the chromophore. 

Next, we examined the other six UVR8 missense proteins. Most 
notably, the variant W233F no longer showed an increase in fluor- 
escence induced by ultraviolet-B (Fig. 4b, left), suggesting that 
Trp 233 also has an essential role in ultraviolet-B perception. By sharp 
contrast, each of the five variants W337F, W302F, W250F, W198F 
and W94F retained the ability to sense ultraviolet-B and to undergo 
ultraviolet-B-induced dimer-to-monomer switches (Fig. 4b and Sup- 
plementary Fig. 8c; W94F, data not shown). These analyses unambigu- 
ously identify Trp 285 and Trp 233 collectively as the chromophore for 
ultraviolet-B perception. Consistent with this conclusion, Trp 285 and 
Trp 233 are involved in considerably more and stronger cation-n 
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Figure 3 | Structural features of the ultraviolet-B-sensing amino acids. 

a, Structural comparison of UVR8 core domain (coloured green) with the 
variants W285F (red) and W285A (blue). Significant conformational changes 
occur to Trp 337, Trp 233 and Asp 129. b, A close-up view on the putative 
ultraviolet-B-sensing residues. Arginine and tryptophan residues are 
differentially coloured. c, A schematic diagram of the extensive network of 


interactions compared to other tryptophans such as Trp 337 and 
Trp 198 (Supplementary Fig. 9). 

Ultraviolet-B perception by Trp 285 and Trp 233 probably results 
in disruption of the cation—z interactions with key arginine residues, 
causing disruption of arginine-mediated intermolecular hydrogen 
bonds and consequent dissociation of UVR8 homodimers. To 
identify the key arginine residues, we generated five UVR8 variants, 
each targeting an arginine at the UVR8 homodimeric interface. 
Compared to wild-type UVR8, the variants R286A and R338A exhib- 
ited a grossly altered ability to perceive ultraviolet-B (Supplementary 
Fig. 10, left). Notably, these two UVR8 variants existed exclusively in a 
monomeric state (Supplementary Fig. 10, right), consistent with the 
important role of Arg 286 and Arg 338 at the homodimeric interface. 
By contrast, the variants R354A, R200A and R146A retained the 
abilities to perceive ultraviolet-B and to undergo an ultraviolet-B- 
induced dimer-to-monomer switch (Supplementary Fig. 10). 


Transient nature of monomeric UVR8 


Absorption of ultraviolet-B is predicted to excite the indole rings of the 
chromophore Trp 285 and Trp 233. Because the excited indole rings 
may dissipate energy to return to their ground state, we suspected that 
the ultraviolet-B-irradiated UVR8 monomers may revert back to 
homodimers over time. To examine this prediction, we subjected the 
UVR8 core domain to a saturating amount of ultraviolet-B radiation, 
left the protein at room temperature (23 °C) to recover in the absence of 
ultraviolet-B, and applied aliquots of the protein to SDS-PAGE at 
various time points. The result clearly shows that the UVR8 monomer 
slowly but steadily converted back to the homodimeric state (Fig. 4c). 
The partially recovered UVR8 homodimers can be completely mono- 
merized again by ultraviolet-B irradiation (Fig. 4c, lane 8). The transient 
nature of monomeric UVR8 has obstructed repeated crystallization 
efforts, which were designed to capture the conformation of the ultra- 
violet-B-activated UVR8 core domain. The crystals were eventually 
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@ Positive {§ Negative 


cation-n interactions. Trp and Tyr are represented by red ovals, whereas Arg 
and Lys are shown in blue circles. The colours blue and red denote positive and 
negative charges, respectively. Purple circles denote cation—n or 1-1 
interactions. Strong interactions, with a distance range of 3.4-4.5 A, are 
represented by large circles. Medium and weak interactions are denoted by 
medium and small circles, respectively. 


obtained but only contained the homodimeric form (Fig. 4d). 
Obviously, ultraviolet-B-irradiated UVR8 core domain in the crystal- 
lization drops slowly reverted to a dimeric state before crystallization. 
Nonetheless, we solved this structure, which is identical to that of the 
UVR8 core domain before ultraviolet-B irradiation. 


Perspective 
Photoreceptors rely on chromophores to perceive light. For example, 


phytochrome covalently associates with a single bilin molecule*”’, 
cryptochrome contains a non-covalently-bound FAD molecule”, 
and phototropin depends on FMN”*”*®. In contrast to all known photo- 
receptors, UVR8 does not contain any external cofactor and instead 
uses two tryptophan residues, Trp 285 and Trp 233, as the chromo- 
phore for ultraviolet-B perception. This seems natural, because the 
absorption wavelengths for tryptophan coincide with the range of 
ultraviolet-B. Consequently, the ultraviolet-B-sensing mechanism of 
UVRS differs markedly from those of other photoreceptors. 

Our experimental evidence, in conjunction with knowledge of 
tryptophan fluorescence, yields a mechanistic model of ultraviolet-B 
perception by UVR8 (Fig. 4e). Ultraviolet-B irradiation results in 
excitation of the Trp 285 and Trp 233 indole rings, which is thought 
to disrupt the I-bond over the indole rings, leading to destabilization 
and abrogation of the intramolecular cation-n interactions. Such 
disruption is likely to trigger conformational switch of the side chains 
of Arg 286 and Arg 338, which would no longer be able to maintain 
intermolecular hydrogen bonds with Asp or Glu residues from the 
neighbouring UVR8 molecule, causing dissociation of the UVR8 
homodimer (Fig. 4e). Furthermore, the excited indole rings are 
known to undergo a process of excited-state proton transfer*', which 
allows the indole ring to carry a positive charge and completely 
destroys the cation—n interactions. Importantly, excited-state proton 
transfer also leads to quenching of intrinsic tryptophan fluorescence, 
providing a plausible explanation for the observed slow decrease of 
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Figure 4 | Identification of Trp 285 and Trp 233 as the ultraviolet-B 
chromophore. a, Identification of Trp 285 as an essential ultraviolet-sensing 
amino acid. In the absence of prior ultraviolet-B (UV-B) radiation, the wild- 
type (WT) UVR8, but not the variant W285<A, displayed a temporal, 
ultraviolet-B-induced increase of tryptophan fluorescence. This fluorescence 
increase completely disappeared after pre-irradiation by ultraviolet-B. b, The 
mutation W233F, but not W337F, in UVR8 led to loss of ultraviolet 
responsiveness as judged by tryptophan fluorescence (left panels). The variant 
W233F is no longer a stable homodimer, both with and without ultraviolet-B 
irradiation (top right). By contrast, the variant W337F can still undergo dimer- 
to-monomer switch in response to ultraviolet-B treatment (bottom right). 

c, Ultraviolet-B-irradiated UVR8 core domain slowly reverted back to a 
homodimeric state as judged by SDS-PAGE. Notably, 35h after ultraviolet-B 


fluorescence signal. Notably, Asp 129, Glu 182 and Arg 234 are all 
located in close proximity to Trp 233 and Trp 285 (Supplementary 
Fig. 11) and may well serve as proton donors. In this model, ultra- 
violet-B perception involves no covalent modification of UVR8, such 
as tryptophan oxidation or crosslinking. This notion is supported by 
mass spectrometric analysis (Supplementary Fig. 12). Further sup- 
porting this conclusion, ultraviolet-B-induced monomerization of 
UVRS was unaffected by the presence of strong reducing or oxidizing 
agents (Supplementary Fig. 13). 
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irradiation, the partially recovered UVR8 homodimers were completely 
monomerized again by ultraviolet-B irradiation (lane 8, indicated by double 
asterisk). d, The crystals derived from ultraviolet-B-irradiated UVR8 core 
domain mostly contained the homodimeric form. Crystals were washed twice 
with crystallization buffer and examined by SDS-PAGE. e, A proposed 
mechanism for ultraviolet-B sensing by UVR8. In step 1, ultraviolet-B radiation 
excites Trp 285 and Trp 233. The excited states (purple wavy lines) of Trp 285 
and Trp 233 are unable to maintain the cation—n interactions with surrounding 
residues. In step 2, disruption of the intramolecular cation-7 interactions 
results in pronounced changes of side-chain conformations (black arrows), 
disrupting the intermolecular hydrogen bonds and causing dissociation of 
UVR8 homodimers. In step 3, Trp 285 and Trp 233 dissipate energy to return to 
the ground state, which allows re-formation of homodimers. 


The observed fluorescence emission represents the total input from 
all 13 tryptophan residues and potentially other aromatic amino acids 
in UVR8. However, only those residues that are affected by ultraviolet-B 
perception or ultraviolet-B-induced environmental changes contribute 
to alteration of the fluorescence signal. The fact that a missense muta- 
tion of either Trp 285 or Trp 233 completely abrogates increase of the 
fluorescence signal identifies these two amino acids collectively as the 
chromophore for ultraviolet-B. In addition, this finding also strongly 
suggests that other tryptophan residues in UVR8 do not undergo any 
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significant ultraviolet-B-induced environmental changes. This analysis 
also validates the measurement of intrinsic tryptophan fluorescence as a 
sensitive method for the detection of ultraviolet-B perception. 

Our study serves as a framework for understanding mutant pheno- 
types in plants. For example, the Arabidopsis mutants G145S and 
G202R lost the ability to perceive ultraviolet-B and the mutant proteins 
G145S and G202R were no longer able to form a homodimer’. In our 
crystal structure, Arg 146 and Arg 200 make an important contribution 
to stabilize formation of the UVR8 homodimer (Fig. 2c); both Gly 145 
and Gly 202 are located in close proximity to Arg 146 and Arg 200, 
respectively (Supplementary Fig. 14). Thus the mutations G145S and 
G202R may have a deleterious consequence on homodimer formation. 

Despite revealing the structure revelation and underlying mechanism, 
important questions remain about the UVR8-mediated signalling 
pathway. It is unclear how UVRS interacts with the central regulator 
of light signalling, COP1. How the UVR8-COP1 complex regulates 
downstream signalling events is uncertain. Answers to these questions 
await further experimental investigations. 

While this manuscript was under final revision at Nature, Christie 
et al. reported similar findings in Science”. 


METHODS SUMMARY 


Wild-type UVR8 and all variants were subcloned by standard molecular biology, 
expressed, and purified. Tryptophan fluorescence was measured in a fluorescence 
spectrophotometer (HITACHI F-4600), with excitation and emission wavelengths 
of 295 and 335 nm, respectively. Gel filtration was used to observe ultraviolet-B- 
induced conformational changes. All crystals were generated by the hanging-drop 
vapour-diffusion method. All data sets were collected at the Shanghai Synchrotron 
Radiation Facility (SSRF) beamline BL17U and the SPring-8 beamline BL41XU 
and processed using the HKL2000 package**. Further processing was carried out 
using programs from the CCP4 suite’. The UVR8 structure was solved using the 
SeMet-derived UVR8(W285A) crystals. The selenium positions were determined 
using the program SHELXD”. A partial model was traced automatically using the 
program BUCCANEER®™. The resulting map was in good quality and the partial 
model was manually rebuilt in COOT”’”. Sequence assignment was aided with the 
selenium sites in the anomalous difference Fourier map. The final structure was 
refined with PHENIX**. Using UVR8(W285A) coordinates as the initial search 
model, crystal structures of W285F and wild-type UVR8 were solved by molecular 
replacement using PHASER” and refined with COOT” and PHENIX”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Protein preparation. All constructs of A. thaliana UVR8 (full-length, residues 
1-440) were cloned into pET-29b (Novagen) with a hexahistidine (6 His) tag at 
the C terminus. The plasmids were then transformed into Escherichia coli 
BL21(DE3) and overexpressed by induction with 0.2 mM -b-thiogalactopyranoside 
IPTG) at 18 °C overnight. The bacteria were harvested by centrifugation and resus- 
pended in 150mM NaCl, 25mM Tris (pH 8.0), and lysed by sonication. After 
centrifugation, the supernatant was loaded into Ni**-NTA affinity columns 
Qiagen), and washed with 150 mM NaCl, 25 mM Tris (pH 8.0). The target protein 
was eluted by 250 mM imidazole (pH 8.0), 25 mM Tris (pH 8.0) and further purified 
by a Source-15Q column (GE Healthcare). The protein was then concentrated, and 
purified by gel filtration (Superdex-200, 10/30, GE Healthcare) with a buffer contain- 
ing 150 mM NaCl, 25 mM Tris (pH 8.0) and 5 mM DTT. The proteins were ready for 
biochemical assay. The core domain of UVR8 (residues 12-381) was purified by gel 
filtration with the same protocol described above and concentrated to about 4 mg 
ml ! for crystallization. 

Ultraviolet-B radiation. The protein was exposed to ultraviolet-B radiation from 
an ultraviolet-B lamp (11 W, Amax = 308 nm), at a distance of 20 cm, for 30 min 
on ice before biochemical assay or crystallization. 

Tryptophan fluorescence measurements. The protein was adjusted to about 
1M in the buffer containing 150mM NaCl, 25mM Tris (pH 8.0), 5mM 
DTT. Tryptophan fluorescence was measured in a fluorescence spectrophotometer 
(HITACHI F-4600). The excitation and emission wavelengths were 295 and 
335 nm, respectively. 

Gel filtration. Superdex-200 (10/30, GE Healthcare) was used to observe ultraviolet- 
B-induced conformational changes. The column was pre-equilibrated with 150 mM 
NaCl, 25mM Tris (pH 8.0), 5mM DTT, and calibrated with molecular weight 
standards (GE Healthcare). The protein with or without ultraviolet-B irradiation 
was injected into the column, and eluted with a flow rate of 0.4 ml min /. 

Mass spectrometry. The untreated and ultraviolet-irradiated wild-type, full- 
length UVR8 samples were mixed with 0.511 of 10mgml”' «-cyano-4- 
hydroxysuccinnamic acid in 50% acetonitrile, 0.1% (v/v) TFA, and applied onto 
a MALDI plate. MALDI mass spectra were recorded with a MALDI_TOF/TOF 
mass spectrometer operated in the linear mode. Bovine serum albumin (BSA) was 
used as the internal standard. The mass difference for the whole protein before 
and after ultraviolet-B treatment is less than 1 Da, indicating the there is no mass 
change upon ultraviolet-B irradiation. 


For detection of potential protein oxidation, the untreated and ultraviolet- 
irradiated UVR8 samples were separated on SDS-PAGE, excised, and in-gel 
digested with trypsin at 37 °C overnight. The peptides were extracted twice with 
1% TFA in 50% acetonitrile for 30 min, and applied to LC-MS/MS analysis in 
the LTQ-Orbitrap mass spectrometer. The MS/MS spectra from each run 
were searched for possible tryptophan hydroxylation and formation of 
N-formyl-kyneurine. No apparent oxidation was detected for any tryptic frag- 
ment. This result suggests that ultraviolet radiation does not induce noticeable 
tryptophan oxidation. 

Crystallization. SeMet-derived UVR8(W285A) crystals were grown at 18 °C using 
the hanging-drop vapour-diffusion method by mixing 1.2 pl of SeMet-derived 
UVR8(W285A) protein with 1.2. of reservoir solution contain 23% (w/v) 
PEG3350, 100 mM Bis-Tris buffer (pH 6.0) and 0.2 ul 30% (w/v) 1,5-diaminopentane 
dihydrochloride. Wild-type UVR8 crystals were obtained by mixing 1.5 tl of protein 
with an equal volume of reservoir solution containing 18% (w/v) PEG8000, 100 mM 
Tris buffer (pH 9.2) and 200 mM magnesium chloride. The UVR8 variant W285F 
was crystallized similarly using a reservoir solution containing 17% (w/v) PEG8000, 
100 mM Tris buffer (pH 8.5) and 200 mM magnesium chloride. All native and Se- 
Met crystals were directly flash-frozen in a cold nitrogen stream at 100 K. 

Data collection and structural determination. The data sets for UVR8(W285F) 
and SeMet-derived UVR8(W285A) were collected at the SSRF beamline BL17U, 
the wild-type UVR8 data were collected at the SPring-8 beamline BL41XU. All data 
sets were integrated and scaled using the HKL2000 package*’. Further processing 
was carried out using programs from the CCP4 suite™*. Data collection statistics are 
summarized in Supplementary Table 1. The UVR8 structure was solved using the 
SeMet-derived UVR8(W285A) crystals in the 222 space group. The selenium 
positions were determined using the program SHELXD”. The identified selenium 
positions were refined and initial phases were calculated using the PHASER SAD 
experimental phasing module”. Solvent flattening and histogram matching were 
applied to the electron density map in DM*. A crude partial model was traced 
automatically using the program BUCCANEER”, then the model was fed back to 
the program PHASER to combine SAD phasing and partial structure information. 
The resulting map was in good quality and the partial model was manually rebuilt in 
COOT”. Sequence assignment was aided with the selenium sites in the anomalous 
difference Fourier map. The final structure was refined with PHENIX”*. Using 
UVR8(W285A) coordinates as an initial model, crystal structures of W285F and 
wild-type UVR8 were solved by molecular replacement using PHASER” and 
manually refined with COOT” and PHENIX*. 
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A close halo of large transparent grains around 


extreme red giant stars 


Barnaby R. M. Norris', Peter G. Tuthill’, Michael J. Ireland'**, Sylvestre Lacour*, Albert A. Zijlstra®, Foteini Lykou”, 


Thomas M. Evans’®, Paul Stewart! & Timothy R. Bedding! 


An intermediate-mass star ends its life by ejecting the bulk of its 
envelope in a slow, dense wind’ ”. Stellar pulsations are thought to 
elevate gas to an altitude cool enough for the condensation of dust’, 
which is then accelerated by radiation pressure, entraining the gas 
and driving the wind***. Explaining the amount of mass loss, 
however, has been a problem because of the difficulty of observing 
tenuous gas and dust only tens of milliarcseconds from the star. 
For this reason, there is no consensus on the way sufficient 
momentum is transferred from the light from the star to the outflow. 
Here we report spatially resolved, multiwavelength observations of 
circumstellar dust shells of three stars on the asymptotic giant 
branch of the Hertzsprung-Russell diagram. When imaged in 
scattered light, dust shells were found at remarkably small radii 
(less than about two stellar radii) and with unexpectedly large 
grains (about 300 nanometres in radius). This proximity to the 
photosphere argues for dust species that are transparent to the 
light from the star and, therefore, resistant to sublimation by the 
intense radiation field. Although transparency usually implies 
insufficient radiative pressure to drive a wind®’, the radiation field 
can accelerate these large grains through photon scattering rather 
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than absorption*—a plausible mass loss mechanism for lower- 
amplitude pulsating stars. 

We observed W Hydrae, R Doradus and R Leonis using aperture- 
masked”"° polarimetric interferometry (Fig. 1), along with dust-free 
stars to verify our detection methodology. Figure 1 shows the ratio of 
the horizontally and vertically polarized visibilities (Voriz/ Vert)» 
plotted as a function of baseline azimuth and length. The dust-free star 
2 Centauri, which has no polarized flux, shows a constant ratio Vporiz/ 
Vvert = 1.0 within its uncertainties. However, the dust-enshrouded star 
W Hya shows a strong sinusoidal variation of Vhoriz/Vverr with 
azimuth, as expected from a resolved circumstellar scattering shell. 
By taking advantage of spherical symmetry, we produced the much 
simpler baseline-dependent observable plotted in Fig. 2. A model was 
then fitted to the data to determine the dust shell radius and the 
amount of light scattered by the shell at each wavelength (Supplemen- 
tary Information). These results are summarized in Table 1. Figure 3 
shows the model image of the star and shell as seen in orthogonal 
polarizations for W Hya at a wavelength of 1.24 um, from which the 
model visibilities were derived. 

Scattered-light dust shells around the three asymptotic giant branch 
(AGB) stars observed were found close to the star, at radii <2 stellar 
radii. This is in contrast to earlier models that place the shell at many 
stellar radii”, but is consistent with some recent models and with inter- 
ferometric'’ and polarimetric’? measurements. On the basis of typical 
elemental abundances and spectral observations, the composition of 


Figure 1 | Polarimetric interferometry of W Hya at 1.24pm. Although light 
scattered by each part of the circumstellar dust shell is strongly polarized, the 
integrated signal recovered with conventional polarimetry is zero for an 
unresolved spherically symmetric shell. In this study, aperture-masking 
interferometry””° (which converts the 8-m pupil of the Very Large Telescope 
into a multi-element interferometer, using the NACO” instrument) allows 
access to the ~10-mas spatial scales required to resolve the shell, and 
polarimetric measurements (obtained by simultaneously measuring 
interferometric visibilities in orthogonal polarizations”) allows light from the 
star and light from the dust shell to be differentiated. Here the ratio of the 
horizontally polarized visibilities (Vi, o1iz) to vertically polarized visibilities 
(Vert) is plotted against baseline azimuth angle (corresponding to position 
angle on the sky). Colour encodes the baseline length (longest, 7.3 m; shortest, 
0.56 m). The ratio Vioriz/ Vvert is a differential observable, which allows the 
cancellation of residual systematic errors and depends only on the fractional 
polarized scattered light signal. a, Result for W Hya, an AGB star with a 
circumstellar shell; Vioriz/ Vverr deviates from 1, varying sinusoidally with 
azimuth. This is the signal expected from a thin, spherically symmetric dust 
shell scattering the light from a central star. This signal varies in amplitude for 
different baselines, encoding the spatial extent of the resolved structure. Data 
points have been repeated over two cycles. The longest baselines (red) have 
poor signal-to-noise ratios because they are close to the null, where the visibility 
curve of the star is extremely low. Error, 1a. b, Visibility data for the star 2 Cen, 
which has no circumstellar dust shell and, hence, no polarized signal from 
scattering; here Vioriz/ Vyert ~ 1 for all azimuths. Errors, lo. 
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AGB dust shells is expected to be dominated by silicates*'*"* in the form 
of olivine (Mg>,.Fe2q—.SiO,4) and/or pyroxene (Mg,Fe(;—,)SiOs), 
where 0 =x=1. The temperature of a grain is determined by its 
opacity, that is, how strongly it absorbs the surrounding radiation 
field. Multiwavelength models° show that silicate dust that contains iron 
absorbs the stellar flux strongly (as these dust species have high opacities 
at wavelengths of ~1 jim, where the energy distribution peaks) and so 
can only condense at distances greater than ~5 stellar radii. These iron- 
rich species could be accelerated by absorption of stellar radiation, but 
they form too far from the star to provide an efficient mass loss mech- 
anism for low-amplitude pulsators® (semiregular variable stars). Our 
detection of dust much closer to the star is instead consistent with the 
presence of iron-free silicates such as forsterite (Mg,SiO.) and enstatite 
(MgsiO3), which are almost transparent at wavelengths of ~1 jm. Such 
grains do not heat to sublimation, despite the intense radiation close to 
the star, but the same transparency also prevents the momentum 
transfer from starlight required to drive a wind. For some stars, a 


Table 1 | Summary of fitted model parameters 
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Figure 2 | Wavelength dependence of scattering for W Hya and R Dor. The 
azimuthally reduced ratio Vporiz/ Vvert for the AGB stars W Hya and R Dor, 
plotted against spatial frequency (baseline length divided by wavelength, 4), at 
multiple wavelengths. The functional form of the visibility ratio excursions of 
all three AGB stars are consistent, within uncertainties, with a simple 
spherically symmetric shell. We were therefore able to enhance our signal-to- 
noise ratio significantly in a quantitative analysis by reducing our two- 
dimensional visibility data to a one-dimensional function of the baseline length 
(corresponding to spatial frequency). This was achieved by dividing Vhoriz/ Vvert 
by the expected sinusoidal variation (characteristic ofa spherical shell) at a fixed 
amplitude, resulting in the much simpler, baseline-dependent observable 
plotted here. Points are the observed data (binned; errors, 1) and the solid lines 
are the fitted model. The spatial frequency of the minimum of the characteristic 
‘dip’ varies with the radius of the dust shell: for a larger shell, the minimum of 
the dip occurs at lower spatial frequencies. The depth of the dip depends on the 
amount of scattered light, with a larger deviation from Vporiz/ Vvert = 1.0 
indicating that a larger fraction of the total flux arises from light scattered by the 
shell. This is seen to decrease strongly at longer wavelengths as expected 
theoretically; the precise change in this quantity as a function of wavelength can 
be used to determine the dust grain radius using Mie scattering theory (Fig. 4). 
The fitted parameters for these quantities are included in Table 1. The 2.06-m 
data have insufficient spatial resolution to constrain the shell size, so for models 
at these wavelengths we fixed the shell size to be consistent with the fitted size at 
shorter wavelengths. 


possible solution to this dilemma arises when very large grains are 
considered. 

The degree of scattering by dust depends strongly on the wavelength 
of the incident light and on the size of the particles. By analysing our 
multiwavelength measurements using Mie scattering theory, we deter- 
mined the effective grain size and the number of grains. As shown in 
Fig. 4, we found an effective grain radius of ~300 nm. For grains of this 
size, the scattering opacity becomes very large, well beyond that result- 
ing from Rayleigh scattering when the particles are smaller. In this 
regime, the contribution to radiative acceleration by scattering, rather 
than by absorption alone, must be considered. Models show that grains 
exceeding a certain critical scattering opacity can drive a wind at high 
magnesium condensation and that, for a star of temperature 2,700 K, 
this critical scattering opacity is only exceeded in a narrow range of 
dust grain radii around 300 nm (ref. 8). We also note that a narrow 
range of grain radii, of the order of ~500 nm, is predicted on the basis 
of a self-regulating feedback mechanism: grain growth effectively halts 
once the critical size is reached, because the dust is then accelerated 
outwards and gas densities quickly decrease®. This is consistent with 
observations of grains in the interstellar medium, which are dominated 
by silicates? and have similar grain sizes'*’’. Wind driving due to 
scattering by magnesium-rich silicates is consistent with the finding 
that mass loss in AGB stars depends on their metallicity'’. Although 
this model encounters difficulties in the case of stars with extremely 
extended atmospheres, such as R Leo (owing to the mass of the stellar 
atmosphere at and above the dust-forming layers being too high to 
allow sufficient acceleration’), it provides a plausible explanation 
for the mass loss of semiregular pulsating stars such as R Dor. Our 


Star D 2 (um) Retar (mas) Resheu (mas) Scattered fraction Grain radius (nm) Scattering-shell mass 

R Dor 0.7 1.04 27.2202 43.3+03 0.124 + 0.003 299+ 39 (2.7 = 0.2) X 10° mM, 
2.06 27.7+1A4 43.6+3.2 0.014 + 0.002 

W Hya 0.2 1.04 18.7+0.4 37.9+0.2 0.176 + 0.002 316+4 (1.04 + 0.02) x 10°°M 2 
1.24 18.9+0.5 37.0203 0.110 + 0.003 
2.06 18.9 (fixed) 37.0 (fixed) 0.022 + 0.004 

RLeo 0.4 1.04 18.3 +03 29.9+04 0.120 + 0.003 ~300* ~2 x10 ?°Ms 


The radii of the dust shells were found to be <2Rstar. The scattered fraction is the proportion of the total flux arising from scattering by the dust shell. The grain radius was obtained from fitting to multiwavelength 
observations using Mie scattering (with the value for R Leo fixed at 300 nm). The scattering-shell mass was calculated only for the observed dust (and not, for example, for a distribution extending to small grains 
invisible to our technique). Stellar radii given are for a uniform disc. All three AGB stars (W Hya, R Dor and R Leo) were observed in March 2009, with additional observations of W Hya at 1.24 um and 2.06 um madein 
June 2010. Although the photospheric diameter, and possibly the dust shell diameter, are expected to vary throughout the stellar pulsation cycle, the two sets of observations for W Hya were taken approximately 
one period apart and have therefore been combined. These figures assume both the dust to be iron-free silicate (forsterite) grains of uniform size and there to be full magnesium condensation. Full magnesium 
condensation is a reasonable assumption for the stars with more compact atmospheres (for example R Dor) but is inconsistent with observed optical depths for stars with more extended atmospheres”. In the 
event that there is also a population of small, weakly scattering grains that do not show up in our data, these values represent lower limits, with the total shell mass being greater. Furthermore, if the shell is 

geometrically extended then the true mass will be greater, as these calculations assume a thin shell. The uncertainties given are based on random errors and do not account for systematic errors such as those 
described here. Hipparcos parallaxes*? and experimentally measured optical constants?* have been used. @ indicates visual phase, derived from the AAVSO International Database; Retar, Stellar radius; Repe1, radius 


of dust shell. 
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Figure 3 | Model image for W Hya with circumstellar shell viewed in 
horizontally and vertically polarized light. The white disc represents the 
uniform-disc star used in the model. A three-dimensional model of a star with a 
thin scattering shell was constructed, and the scattered intensity observed in 
each polarization for each point on the shell was calculated using Mie scattering, 
yielding an image of the star and shell. We then derived polarized visibilities 
from the model and fitted them to the observed visibilities, to determine the dust 
shell radius and the scattered fraction. Details of the modelling process can be 
found in Supplementary Information. See Supplementary Fig. 2 for a diagram 
illustrating how the polarized intensity distribution arises. 


observations provide direct evidence for a population of dust grains 
capable of powering a scattering-driven wind. 

The last column of Table 1 gives the mass of the dust that contri- 
butes to the observed scattering signal, assuming the shell to be thin 
and the dust grains to be forsterite of a uniform size. If full magnesium 
condensation and solar abundances are assumed, then the gas-to-dust 
ratio is ~600, which yields total shell masses of ~6 X 107M O> 
~2X 10 7Mo and ~1 xX 10 7Me for W Hya, RDor and RLeo, 
respectively. Because the pulsation periods of these stars are ~1 yr 
and the mass loss rates are ~1 X10 ’Mo yr (refs 20, 21), this 
implies that for stars with less extended atmospheres a large fraction 
of the observed shell is ejected each pulsation cycle, consistent with the 
observed dust being part of an outflow. In the extended-atmosphere 
case, where full magnesium condensation is not observationally 
supported, a possible alternative dust species is corundum (A1,03), 
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Figure 4 | Grain size measurement. Grain size fitted to the fraction of 
scattered light as a function of wavelength, for W Hya. Inset, magnified view of 
the 1.0-1.3-m region. The solid line represents the fitted Mie scattering model 
(where grain size and grain number were fit parameters), and the dashed line 
represents the best Rayleigh fit (where grain size was fixed to be below the 
Rayleigh limit). The data are inconsistent with Rayleigh scattering. The fitted 
Mie model yields an effective grain radius of 316 + 4nm. In reality, a 
distribution of grain sizes may be present; for example, a population of very 
much smaller particles would contribute only weakly and could be undetected. 
Our data show that, regardless of the presence or absence of smaller grains, a 
population of large, ~300-nm, grains is required. Errors, lo. 
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as discussed in Supplementary Information. Further time-dependent 
grain growth and dynamical models will help elucidate the role of light 
scattered from large-grained dust in the process of mass loss from AGB 
stars. 
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Layered boron nitride as a release layer for 
mechanical transfer of GaN-based devices 


Yasuyuki Kobayashi', Kazuhide Kumakura', Tetsuya Akasaka! & Toshiki Makimoto' 


Nitride semiconductors are the materials of choice for a variety of 
device applications, notably optoelectronics’? and high-frequency/ 
high-power electronics*. One important practical goal is to realize 
such devices on large, flexible and affordable substrates, on which 
direct growth of nitride semiconductors of sufficient quality is prob- 
lematic. Several techniques—such as laser lift-off*°—have been 
investigated to enable the transfer of nitride devices from one sub- 
strate to another, but existing methods still have some important 
disadvantages. Here we demonstrate that hexagonal boron nitride 
(h-BN) can form a release layer that enables the mechanical transfer 
of gallium nitride (GaN)-based device structures onto foreign sub- 
strates. The h-BN layer serves two purposes: it acts as a buffer layer 
for the growth of high-quality GaN-based semiconductors, and pro- 
vides a shear plane that makes it straightforward to release the result- 
ing devices. We illustrate the potential versatility of this approach by 
using h-BN-buffered sapphire substrates to grow an AlGaN/GaN 
heterostructure with electron mobility of 1,100 cm?V~'s~1, an 
InGaN/GaN multiple-quantum-well structure, and a multiple- 
quantum-well light-emitting diode. These device structures, ranging 
in area from five millimetres square to two centimetres square, are 
then mechanically released from the sapphire substrates and success- 
fully transferred onto other substrates. 

State-of-the-art growth technologies, such as metal-organic vapour 
phase epitaxy (MOVPE) and molecular beam epitaxy, enable single- 
crystal high-quality nitride devices to be grown on sapphire, silicon 
carbide and silicon substrates, but not on polycrystalline or amorphous 
substrates®. Releasing single-crystal high-quality devices from one sub- 
strate and transferring them to another is one of the promising ways to 
address this limitation. In general, nitride semiconductors have been 
grown on sapphire substrates using a buffer layer: such a layer can 
consist of low-temperature AIN (ref. 7), low-temperature GaN (ref. 8) 
or AION (ref. 9). However, releasing nitride semiconductors from 
sapphire substrates is difficult owing to the strong covalent sp” bonding 
between the buffer layer and the semiconductors. To overcome this 
difficulty, thermal release with laser radiation*’, stamp-based print- 
ing", chemical release by etching of a sacrificial layer!’ '°, wet chemical 
etching'*"”, and mechanical release’* from sapphire substrates have 
been investigated. Compared with the thermal and chemical 
approaches, mechanical release is, in principle, an uncomplicated tech- 
nique requiring neither chemical treatments nor additional equipment 
for the transfer. However, damage, the limited size of the released GaN, 
and limited throughput in mechanical releases have remained serious 
issues. Graphite, graphene’? and h-BN act as ideal layers for releasing 
devices from substrates mechanically. However, direct growth of 
nitride semiconductors on graphene is impossible, and an intermediate 
layer, such as a zinc oxide (ZnO) nano-wall, is therefore needed’*. In 
addition, the transfer of nitride semiconductors grown on graphene 
lacks scalability. On the other hand, we expected that high-quality 
nitride semiconductors could be grown on h-BN with an AIN or 
AlGaN buffer with high scalability because h-BN is itself a nitride 
semiconductor and can be grown on a substrate uniformly. In an 
earlier study, polycrystalline BN films were used as buffer layers to 


grow GaN films on (001) Si substrates; however, the grown GaN was 
polycrystalline”. In addition, the study was restricted to the use of BN 
as a buffer layer. Here, as a new release layer, we use single-crystal 
h-BN, which not only acts as a buffer layer for a nitride device but also 
allows us to release the device from the substrate and transfer it to a 
foreign substrate mechanically in a highly scalable way. This versatile 
approach has the potential to be scaled up to production size (several 
inches) at low cost. 

The single-crystal h-BN layer with an atomically flat surface enables 
us to grow an AlGaN/GaN heterostructure and InGaN-based multiple 
quantum well (MQW) and MQW LED (light-emitting diode) struc- 
tures and to release these structures from the host sapphire substrates 
and transfer them to foreign substrates. Figure 1 depicts our materials 
design and the release and transfer processes. First, we grew a single- 
crystal (0001) h-BN ultrathin layer on a (0001) sapphire substrate”! 
(Fig. la). The orientation relationship between the substrate and the 
h-BN is (0001)h-gn || (0001)sapphires Where the plane of boron and 
nitrogen bonded with sp* hybridization is parallel to the substrate 
surface. Then, the MQW structure—consisting of a 0.3-j1m-thick 
AlGaN layer, a 3-pm-thick GaN layer, a ten-period InGaN/GaN 
MQW structure, and a 0.1-j1m-thick GaN layer—was grown on the 
h-BN release layer (Fig. 1a). After the MOVPE growth, we flipped the 
MQW structure upside down and put it on a foreign substrate via an 
adhesive sheet (an indium sheet in this case) (Fig. 1b). Next, the MQW 
structure attached to the indium sheet on the foreign substrate was 
heated to a temperature sufficient to heat-seal the indium to the sapphire 
and the MQW. (Fig. 1c). Finally, the MQW structure was released from 
the host sapphire substrate by mechanical force and the MQW was 
thereby transferred to the foreign substrate (Fig. 1d). The mechanical 
force easily separates the MQW from the host sapphire and the sepa- 
ration occurs within the h-BN release layer owing to van der Waals 
forces of the h-BN layered structure. In this way, MQWs and other 
types of nitride devices can be transferred to all kinds of substrates, 
such as silicon, polycrystalline metal, glass and transparent plastics. 

An AIN or AlGaN layer on the h-BN layer makes it possible to 
greatly improve the surface morphology and crystalline quality of 
nitride semiconductors grown on it. GaN directly grown on the 
h-BN layer has a rough and irregular island-shaped surface morpho- 
logy, and is polycrystalline (Supplementary Fig. 1). Hence, we first 
grew a wurtzite AIN layer on the h-BN and then grew a GaN film 
on the AIN to overcome the difficulty of direct growth of GaN (Fig. 2a). 
In contrast to GaN grown on the h-BN directly, GaN grown on the 
AIN shows a step-like flat surface with measured root mean square 
(r.m.s.) roughness of 0.69 nm over an area of 5 X 5 um”, as seen in the 
atomic force microscopy (AFM) image in Fig. 2b. X-ray diffraction in 
the 20/@ configuration exhibits only GaN (0002) and AIN (0002) 
diffraction peaks (Fig. 2c), indicating that (0001) single-crystal GaN 
film has been grown on the (0001) AIN on the h-BN layer. The ori- 
entation relationship between the GaN, AIN, h-BN and the sapphire 
substrate is described in Supplementary Figs 2 and 3. The weak beam 
dark-field transmission electron microscopy (TEM) images in Fig. 2d 
and e reveal that the AIN layer, working as a dislocation filter, decreases 
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Figure 1 | Schematic illustrations of the MQW materials design, release and 
transfer processes. a, Single-crystal h-BN release layer growth is performed on 
a sapphire substrate, followed by growth of the wurtzite AlGaN layer and the 
MQW structure. On the right, the crystal structure of h-BN is shown. The 
dotted lines mark the unit cell dimensions of h-BN. b, The MQW structure is 
flipped upside down and placed on a foreign substrate via an adhesive sheet 
(indium sheet). c, The MQW structure attached to the sheet is heated to a 
temperature sufficient to heat-seal the indium, and then the MQW structure is 
released from the host sapphire substrate by mechanical force and transferred 
to the foreign substrate (d). 


the density of threading dislocations in the GaN and that the predom- 
inant defects in the GaN are mixed dislocations with a density of 
8.6 X 10°cm~* (Supplementary Information part III). In the GaN 
film, the tilt and twist angles were respectively evaluated to be 0.16° 
and 0.47° from the full-width at half-maximum (FWHM) of the X-ray 
rocking curve profiles obtained by fitting the rocking curve profiles toa 
pseudo-Voigt function’. Figure 2f shows a convergent beam elec- 
tron diffraction (CBED) pattern experimentally obtained for the GaN 
layer and a simulated CBED pattern for Ga-terminated GaN. The good 
agreement between these patterns reveals that the GaN film has a Ga 
termination (Fig. 2f). In addition, to verify the device quality of the 
GaN, we grew an Alo »sGao 7gN layer with a thickness of 25 nm on the 
GaN and demonstrated that the AlGaN/GaN heterostructure with an 
area of two centimetres square has a two-dimensional electron gas 
mobility of 1,100cm*V 's ' with a sheet carrier density of 
1X 10%cm? at room temperature (293K). Taken together, the 
AFM, X-ray diffraction, cross-sectional TEM, X-ray rocking curve, 
CBED and mobility confirm that single-crystal wurtzite GaN of device 
quality grows on the h-BN layers. In addition, a flat single-crystal GaN 
layer can be grown on a single-crystal AlGaN layer on the h-BN 
(Supplementary Fig. 4). The single crystal h-BN layer works as the 
release layer. The AIN or AlGaN is the buffer, and the thick GaN film 
ensures device quality in spite of the large lattice mismatch. 
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Figure 2 | Flat single-crystal wurtzite GaN is grown on an AIN layer on the 
h-BN layer. a, Schematic illustration. b, AFM image of the surface of the GaN 
film. c, X-ray diffraction using the 20/@ configuration for the GaN film. a.u., 
arbitrary units. d, e, Weak beam dark-field TEM images. g is the reciprocal 
lattice vector, and the arrows indicate the direction of the vectors. The numbers 
are Miller indices. f, Experimental and simulated CBED patterns. The numbers 
are Miller indices. 


Before fabricating the LED structure, we grew an InGaN/GaN 
MQW structure to investigate its optical properties. The photographs 
in Fig. 3a and b show the transferred AlGaN/GaN heterostructure, 
approximately 2 cm square, and the MQW structure, approximately 
5mm square, on the indium sheets attached to the foreign sapphire 
substrates, respectively. We can see the surface of the indium sheet 
because the AlGaN and the MQW are transparent. The size of the 
transferred area can be controlled by the adhesive sheet sizes. It takes 
only a few seconds to release the host sapphire substrate from the 
AlGaN/GaN and MQW structure, which is an unusually speedy pro- 
cess compared to laser lift-off. The r.m.s. roughness of the indium sheet 
is 12nm over an area of 5m X 5um. Some protrusions from the 
indium sheet are clearly visible, indicating that the AlGaN/GaN and 
MQW structures are mechanically released from the native sapphire 
substrate, as illustrated in Fig. 1. No cracks were observed in the 
transparent AlGaN/GaN structure up to maximum size of about 
lcm square, suggesting that the mechanical release process using 
the h-BN layer ensures minimal crack formation. The AFM image 
in Fig. 3c shows that the surface of the transferred structure is flat, 
with r.m.s. roughness of 0.95 nm over an area of 5 um X 5 jm, which is 
considerably lower than that of surfaces separated by laser lift-off 
techniques*”® (typically 10-60nm). X-ray photoelectron spectro- 
scopy spectra confirm that the separation actually occurs within the 
h-BN release layer along the plane of B and N bonding (Supplementary 
Fig. 5). X-ray diffraction (using the 20/@ configuration) measurements 
of the InGaN/GaN MQW before and after the transfer showed satellite 
peaks from the MQW up to first order, along with those for GaN 
(0002), AlGaN (0002) and sapphire (Fig. 3d). This indicates that the 
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Figure 3 | Structural and optical properties of an InGaN/GaN MQW 
structure before and after transfer. a, Photograph of the transferred AlGaN/ 
GaN structure attached to the indium sheet on the foreign sapphire substrate. 
The smallest division in the scale at left is 500 pm. b, Photograph of the 
transferred MQW structure attached to the indium sheet on the foreign 
sapphire substrate. One division in the scale at left is 500 um. c, AFM image of 
the surface of the transferred structure. d, X-ray diffraction using the 20/a 
configuration for the InGaN/GaN MQW before and after the transfer. 

e, Photoluminescence spectra at room temperature for the MQW before and 
after transfer. Excitation wavelength, 375 nm. f, Raman spectra for the MQW at 
room temperature before and after transfer. 


MQW period (the sum of well and barrier thickness) is 5.5 nm and that 
the In content in the InGaN well is 15%. In Fig. 3d, the transferred 
MQW structure exhibits satellite peaks up to first order with almost 
the same intensities as those before the release. 

To check for degradation of the nitride semiconductor that might be 
created by release and mounting, we performed room temperature 
photoluminescence measurements on the MQW before and after 
the transfer. The excitation source was an InGaN-based semi- 
conductor laser diode with an emission wavelength of 375nm. 
Typical photoluminescence spectra from the MQW before and after 
transfer show strong luminescence at almost the same peak wave- 
length of 434 nm; however, the intensity after the transfer is compar- 
able to or stronger than that before the transfer (Fig. 3e). We also grew 
the same InGaN/GaN MQW structure on a GaN film using a standard 
low-temperature AIN buffer layer on sapphire substrate and per- 
formed room-temperature photoluminescence using a He-Cd laser. 
The luminescence intensity from the as-grown MQW structure on the 
h-BN release layer was comparable to or higher than that from the one 
grown on the low-temperature AIN buffer layer. The Raman spectrum 
for the MQW before the transfer displays the E, mode at 569 cm — ‘and 
the A, mode from the GaN film, and shows the Raman mode from the 
host sapphire substrate as well (Fig. 3f). The Raman spectrum of the 
MQW after the transfer shows the E, mode from the GaN film at 
567cm ' with a downshift of 2cm™' (Fig. 3f), indicating that the 
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GaN before and after the release is compressively strained and 
unstrained’’”*, respectively. The X-ray diffraction, photoluminescence 
and Raman experiments demonstrate that the MQW structure retains 
its original crystal quality, even after the transfer processes. 

Next, we describe the electroluminescence emitted from the trans- 
ferred LED (Supplementary Fig. 6a) at room temperature. For com- 
parison, the same MQW LED structure was grown on a typical 
low-temperature AIN buffer layer on a sapphire substrate and the 
conventional MQW LED was fabricated without lift-off (Supplemen- 
tary Fig. 6b). Current-voltage characteristics of the transferred LED 
show clear rectification (Fig. 4a). In the electroluminescence spectra of 
the transferred LED and conventional LED with currents ranging from 
10 to 50 mA, the intensities of the electroluminescence increase with 
current (Fig. 4b and c) and the electroluminescence intensities from 
the transferred MQW LED were comparable to or higher than the 
intensities from the conventional MQW LED on the low-temperature 
AIN buffer layer at the same current. These higher electrolumines- 
cence intensities are caused by reflection from the back-side contact 
indium. The comparable intensities and almost the same FWHM of 
the electroluminescence of the conventional and transferred LED indi- 
cate that the MQW preserves its original quality after the transfer. We 
further succeeded in transferring a vertical-type LED (Supplementary 
Fig. 6c). This LED emits blue light at room temperature (Fig. 4d). 
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Figure 4 | Electroluminescence of transferred and conventional MQW LED 
at room temperature. a, Current—voltage characteristics of the transferred 
LED. b, Electroluminescence spectra from the transferred LED with currents 
ranging from 10 to 50 mA. ¢, Electroluminescence spectra from the conventional 
LED with currents ranging from 10 mA to 50 mA. d, Optical image of the blue- 
light electroluminescence from the transferred vertical-type LED. 
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Finally, we fabricated a battery-powered LED prototype. The 
released LED is about 2mm square and 3.4-11m thick. A schematic 
cross-section of the LED prototype is shown in Fig. 5a. First, two 
T-shaped Pd/Au electrodes were deposited on two commercially avail- 
able laminate films. Then, the released LED structure with the Pd/Au 
electrodes was mounted on the T-shaped electrode of one laminate 
film and sandwiched with the other T-shaped electrode of the other 
laminate film using the indium contact layer. The released LED was 
thereby sandwiched between the two laminates, which were sealed 
together by heat. Figure 5b shows a top-view photograph of the LED 
prototype, which is 18 mm long and 22 mm wide. The side-view of the 
LED prototype anchored with tweezers shows a thickness of about 
200 um (Fig. 5c), which can offer flexible and portable applications. 
The LED prototype emitted violet-blue electroluminescence at room 
temperature (Fig. 5d). 

Compared with conventional laser lift-off, mechanical release using 
h-BN has several advantages. First, this approach requires no lift-off 
equipment and no chemical etchant, leading to considerable cost 
reduction. Second, the transfer in this approach is completed within 
several seconds because the separation harnesses the van der Waals 
forces of the h-BN release layer. In contrast, laser lift-off requires laser 
scanning with additional adjustment of the laser. Third, the process is 
simple: the mechanical force releases GaN-based structures in only one 
step. In laser lift-off, the laser must be scanned step-by-step, then the 
irradiated samples must be heated to the melting point of Ga, and 
finally the GaN sample after laser irradiation must be cleaned with a 
dilute acid (such as HCl) to remove material residues. The separated 
surface after the mechanical release is essentially flat, with an r.m.s. 
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Figure 5 | Electroluminescence of LED prototype at room temperature. 

a, Schematic cross-section of the LED prototype. b, c, Top-view (b) and side- 
view (c) photographs of the LED prototype. The T-shaped black Pd electrode 
surface on one laminate film is on the left-hand side; the gold electrode on the 
other is on the right-hand side. The LED structure was mounted between the 
top of the two T-shaped electrodes. d, Optical image of the violet-blue 
electroluminescence from the LED prototype. 
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roughness of 1 nm, which is considerably lower than that obtained 
using laser lift-off. The process we report here opens the way to releas- 
ing and transferring a wide range of nitride semiconductor devices to 
large-area, flexible and affordable substrates. 


METHODS SUMMARY 


The h-BN release layer, the AlGaN/GaN heterostructure, the InGaN/GaN 
MQW and the MQW LED structures were grown by MOVPE. Ttriethylboron, 
trimethylgallium, trimethylaluminium, triethylgallium and trimethylindium were 
the group III sources and ammonia was the group V source. Silane and bis- 
cyclopentadienyl-magnesium were the n-type and p-type dopant gases, respec- 
tively, with hydrogen or nitrogen carrier gas. Electron beam deposition provided 
Au, Pd and Al electrodes. A high-resolution X-ray diffractometer (Philips X’Pert 
System) with a copper target was used to evaluate the structural quality and 
perform X-ray diffraction pole figure measurement. The photoluminescence mea- 
surements were performed with an InGaN-based laser at a wavelength of 375 nm 
and a He-Cd laser at room temperature. We used Raman scattering (Renishaw 
system) with an excitation laser wavelength of 534nm. We used an electron 
cyclotron resonance plasma etching system with chlorine to fabricate LEDs. 
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Aerosols implicated as a prime driver of 
twentieth-century North Atlantic climate variability 


Ben B. B. Booth!, Nick J. Dunstone!, Paul R. Halloran", Timothy Andrews! & Nicolas Bellouin! 


Systematic climate shifts have been linked to multidecadal variability 
in observed sea surface temperatures in the North Atlantic Ocean’. 
These links are extensive, influencing a range of climate processes such 
as hurricane activity” and African Sahel*° and Amazonian’ droughts. 
The variability is distinct from historical global-mean temperature 
changes and is commonly attributed to natural ocean oscilla- 
tions’ ’°. A number of studies have provided evidence that aerosols 
can influence long-term changes in sea surface temperatures’””, 
but climate models have so far failed to reproduce these inter- 
actions® and the role of aerosols in decadal variability remains 
unclear. Here we use a state-of-the-art Earth system climate model 
to show that aerosol emissions and periods of volcanic activity 
explain 76 per cent of the simulated multidecadal variance in 
detrended 1860-2005 North Atlantic sea surface temperatures. 
After 1950, simulated variability is within observational estimates; 
our estimates for 1910-1940 capture twice the warming of previous 
generation models but do not explain the entire observed trend. 
Other processes, such as ocean circulation, may also have contributed 
to variability in the early twentieth century. Mechanistically, we find 
that inclusion of aerosol-cloud microphysical effects, which were 
included in few previous multimodel ensembles, dominates the 
magnitude (80 per cent) and the spatial pattern of the total surface 
aerosol forcing in the North Atlantic. Our findings suggest that 
anthropogenic aerosol emissions influenced a range of societally 
important historical climate events such as peaks in hurricane 
activity and Sahel drought. Decadal-scale model predictions of 
regional Atlantic climate will probably be improved by incorporat- 
ing aerosol-cloud microphysical interactions and estimates of 
future concentrations of aerosols, emissions of which are directly 
addressable by policy actions. 

An understanding of North Atlantic sea surface temperature 
(NASST) variability is critical to society because historical Atlantic 
temperature changes are strongly linked to the climate, and its impacts, 
in neighbouring continental regions. For example, strong links 
between NASST variability and periods of African Sahel drought are 
found in observations*'* and physical climate models**"*. Similar 
covariation between NASSTs and rainfall in eastern South America 
has been found’, as have links to changes in both mean rainfall’* and 
rainfall extremes'®, Atlantic hurricane activity*’®'* and European 
summer climate®. These changes are not solely limited to the regions 
bordering the Atlantic, but also have links to Indian monsoon rainfall", 
Arctic and Antarctic temperatures’, Hadley circulation’, El Nifo/ 
Southern Oscillation'’ and relationships between El Nifo/Southern 
Oscillation and the Asian monsoon”. 

A link between multidecadal variability in NASST and circulation 
changes internal to the ocean was first proposed in 1964 (ref. 20) and 
later named the Atlantic Multidecadal Oscillation”'. This variability is 
often characterized as the detrended NASST between the equator and 
latitude 60° N (longitude 7.5-75° W;; ref. 8). Although it has recently 
been questioned”, the present consensus remains that most of the 
observed Atlantic temperature variations occur in response to the 


ocean’s internal variability. This picture emerged from general circula- 
tion models, a number of which inherently produce multidecadal 
Atlantic variability in the absence of external climate forcing’ and, 
when considered together as a multimodel mean, have shown 
little evidence of forced changes projecting onto the NASSTs*”. 
Observationally, this interpretation has been accepted because the 
Atlantic temperature changes seem to be oscillatory, both around 
any secular long-term trend and when calculated as anomalies from 
the global-mean change. 

Motivated by the recent identification of the importance of aerosol 
process complexity in interhemispheric Atlantic temperature 
changes”, apparent aerosol correlation’”’ and volcanic modulation 
of Atlantic variability”, we use new general circulation model simula- 
tions to question whether the CMIP3 (Climate Model Intercomparison 
Project phase 3) models contained the complexity necessary to repres- 
ent a forced Atlantic Multidecadal Oscillation”®. We use HadGEM2-ES 
(the Hadley Centre Global Environmental Model version 2 Earth 
System configuration”), a next-generation CMIP5 (Climate Model 
Intercomparison Project phase 5) model, which represents a wider 
range of Earth system processes (in particular aerosol interactions”) 
than do CMIP3 models. 

To separate internal variability from forced changes, we present 
climate model ensemble-mean NASSTs, averaged over parallel model 
simulations started from different initial conditions’. If external 
forcing dominates the NASST evolution then ensemble members will 
evolve in phase and thus combine to produce a robust ensemble-mean 
response. If internal ocean dynamics dominate then each member will 
evolve separately and the resulting ensemble mean will show little 
residual variation around the underlying warming trend. This 
approach allows identification of physical mechanisms linking forced 
changes to Atlantic temperatures and was used in previous CMIP3 
studies®”. 

In Fig. 1a, we reproduce the multimodel-mean NASST response of 
the six CMIP3 models used in ref. 9 (ENS1, blue) and the eleven 
models used in ref. 6 (ENS2, green) (Supplementary Table 2). The 
observations (Fig. 1) show marked multidecadal variations. The 
multimodel-mean responses in both ENS1 and ENS2 do capture the 
underlying trend through the century; they capture only weak multi- 
decadal variability. For example, the ensembles’ 1950-1975 cooling is 
only a small fraction of the observed value (Fig. 1a and Supplementary 
Fig. 4). Therefore, the unexplained multidecadal signal was previously 
attributed to internal ocean variability*”. 

By contrast, HadGEM2-ES (Fig. 1b) reproduces much more of the 
observed NASST variability (correlation, 0.65; 75% of detrended 
standard deviation (smoothed over 10-yr intervals to highlight multi- 
decadal component)). The post-1950s cooling and subsequent warm- 
ing now falls within the observed trends (Supplementary Table 1). 
Observed warming in the earlier period (1910-1940) is larger than 
simulated by HadGEM2-ES (Fig. 1b and Supplementary Table 1); 
however, these new simulations capture roughly twice the early- 
twentieth-century warming of previous CMIP3 generation models. 


1Met Office Hadley Centre, FitzRoy Road, Exeter EX1 3PB, UK. 
*These authors contributed equally to this work. 
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Figure 1 | Atlantic surface temperatures. Comparison of the area-averaged 
North Atlantic SSTs (defined as 7.5-75° W and 0-60° N), relative to the 1901- 
1999 average, of an observational estimate (the US National Oceanic and 
Atmospheric Administration’s Extended Reconstructed SST” (ERSST), black) 
and two published*? CMIP3 model composites (ENS1, blue; ENS2, green; 

a); the HadGEM2-ES model (orange; shading represents 1 s.d. of the model 
ensemble spread; b); and two recomposites from CMIP3, the first with models 


This points to a larger forced role in this period. Other processes not 
represented by our ensemble-mean response (such as ocean dynamical 
changes) may also contribute to this early trend. 

In examining why the HadGEM2-ES ensemble reproduces the 
observed NASST variability better than previous multimodel studies 
have done®’ (Fig. 1a, b), we can discount the possibility that the 
HadGEM2-ES variability is predetermined, because the initial condi- 
tions were selected to sample different phases of Atlantic variability”. 
Furthermore, an additional HadGEM2-ES ensemble that omits 
changes in aerosol emissions neither has the same multidecadal vari- 
ability as the all-forcings ensemble nor reproduces the observed 
NASSTs (Fig. 2a). 

Replication of a large fraction of the observed NASST variability by 
HadGEM2-ES allows us to identify forcings and mechanisms, consist- 
ent with the observed variability, within the model framework. 
Variability of ensemble-mean NASST from historical simulations 
including time-varying aerosol emissions is strongly correlated with 
variability in simulated net surface shortwave radiation (Fig. 2b), 
which in turn has the same temporal structure as variability in aerosol 
optical depth changes (Fig. 2c) and periods of volcanic activity (Fig. 2d). 
Other terms in the surface heat budget (Supplementary Fig. 2) have a 
role in the simulated NASST change. However, it is the surface short- 
wave component that produces the dominant multidecadal variations. 


| | L 
1940 1960 1980 


Year 


that represent only direct aerosol (mean of five contributing models, red) and 
the second with models representing both indirect effects interactively (three 

models, blue) (c). In all panels, trends between 1950-1975 (K per decade) are 
shown. The error estimates are based on the s.d. of the 25 trends between a 5-yr 
period (1948-1952) at the start of this interval and a 5-yr period (1973-1977) at 
the end. All data have been latitude-weighted when calculating area averages. 


Volcanoes and aerosols respectively explain 23 and 66% of the temporal 
(10-yr-smoothed) multidecadal variability of the detrended NASST 
(Supplementary Fig. 5). Combining both contributions explains 76% 
(80% after inclusion of mineral dust aerosols) of the simulated variance. 
Inclusion of mineral dust processes may potentially be important 
because emissions are known to respond to North-Atlantic-driven 
changes in Sahel rainfall, and thus represent an important positive 
feedback on NASSTs in the real world’’. The lack of a multidecadal 
dust signal (Supplementary Information) in HadGEM2-ES simulations 
suggests that we are likely to be underestimating the magnitude of the 
forced Atlantic response. 

The volcanic influence on Atlantic variability has been demon- 
strated previously’**’. We focus on the anthropogenic aerosol com- 
ponent of the shortwave changes identified here as driving the model’s 
multidecadal NASST variability. Aerosol concentration changes influ- 
ence the spatial response (Fig. 3) of NASST as well as its temporal 
evolution. Prevailing winds advect aerosols emitted in industrial North 
America in a band across the North Atlantic that mixes with polluted 
air masses over Europe before being transported by trade winds south 
and west. The large-scale pattern of shortwave change is explained by 
the effect of cloud microphysical response to these changes in aerosol 
concentration. The shortwave variability largely occurs where aerosol 
changes coincide with large-scale cloud distribution. On a regional 
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Figure 2 | External forcing of surface temperature and surface shortwave 
radiation linked to aerosol and volcanic changes. a, Ensemble-mean NASST 
(7.5-75° W and 0-60° N) simulated by the HadGEM2-ES model considering all 
available external climate forcings (red/blue), and all forcings except 
anthropogenic aerosol emissions (black). b, Ensemble-mean shortwave (SW) 
radiation entering ocean (red/blue) alongside reconstruction of this shortwave 
radiation based on a linear relationship with total anthropogenic aerosol and 


scale, coupled processes, such as temperature feedbacks, can lead to an 
enhanced local response (Supplementary Information). The same 
horseshoe-shaped signature is seen in shortwave and NASST variability 
(Fig. 3). The map of NASST change between warm and cold phases of 
multidecadal variability is consistent with the observed variations in 
SST ss (Fig. 3). 

So far consistent spatial and temporal changes between aerosol 
burden, shortwave and NASST have been presented. It is not clear, 
however, whether these changes are externally forced by aerosols or are 
mediated by ocean circulation. Here we present a parallel simulation of 
the historical period, driven by identical emission and concentration 
changes, but with the SST explicitly fixed at their 1860 climatological 
values. The shortwave changes arising in this parallel experiment share 
the temporal structure and magnitude of the shortwave changes from 
our standard historical simulations (Fig. 4a). By explicitly removing 
any feedback from SST change on shortwave, we demonstrate that 
simulated historical shortwave variability arises directly from aerosol 
and volcanic forcing of the surface radiation and is not mediated by 
ocean circulation change. 

One of the reasons why the role of aerosols in driving multidecadal 
variability has not previously been identified is the level of aerosol 
physics represented in climate models at the time of the CMIP3 multi- 
model comparison project (Supplementary Table 2). Although all the 
CMIP3 models represented the direct effect of aerosols on shortwave 
radiation, most omitted or only partly represented the indirect aerosol 
effects”. 
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volcanic optical depth changes (green). The linear model explains 79% of the 
simulated variance. c, Change in total (red/blue shading) and individual species 
(coloured lines) of anthropogenic aerosol optical depth (degree of absorption/ 
scattering) over the North Atlantic. d, Volcanic optical depth from ref. 30 as 
implemented in HadGEM2-ES simulations. In a-c, red and blue shading 
represents values above or below (or vice versa) a least-squares linear fit to the data. 


Recently, albedo differences in CMIP3 aerosol representation have 
been shown to be important for simulating SST changes”*: models that 
represent indirect aerosol effects capture more of the observed Atlantic 
interhemispheric change than those that do not. Recompositing the 
models used in Fig. 1a into those with only direct aerosols effects, and 
those that also include the first indirect effect interactively, shows some 
evidence of multidecadal variability (Fig. 1c), illustrating that aerosol- 
cloud microphysical processes have a role even in previous-generation 
models. These models do not, however, reproduce the magnitude of 
the multidecadal NASST of HadGEM2-ES. 

In HadGEM2-ES, the aerosol indirect effects account for 80% or 
more of the total aerosol forcing in the North Atlantic region (Fig. 4b 
and Supplementary Fig. 1). Although there is some discussion of the 
magnitude of the indirect effects**’’, omission of these processes will 
lead to an underestimation of the modelled aerosol impact on the 
NASST. Looking at the relative roles of the first and second indirect 
effects (using changes in optical depth and cloud effective radius as 
respective metrics for these effects; Fig. 4c), we see a more pronounced 
response to early-twentieth-century variations for the indirect effect 
(effective radius) due to higher sensitivity of cloud albedo changes to 
changes in aerosol number in cleaner conditions’. In all, the inclusion 
of aerosol indirect effects in HadGEM2-ES magnifies the shortwave 
and, hence, the NASST response to aerosols, as well as influencing the 
spatial and temporal character of the historical multidecadal variability. 
We also note that although some climate models (such as HadGEM2- 
ES) reproduce the observed sensitivity of cloud albedo to changes in 
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Figure 3 | Differences in spatial response between warm and cold Atlantic 
phases. Differences between the average of warm years (warmest third of the 
data) and the average of cold years (coldest third of the data), after the data has 
been detrended, show how patterns of multidecadal variability in aerosol 
burden (a) interact with the climatological cloud field (b) to influence the 
pattern of net surface shortwave radiation change (c) and, hence, NASSTs 
(d). The pattern of aerosol burden changes is linked to emissions in industrial 
North America and Europe by the climatological wind field (the direction and 
magnitude of which is indicated by the arrows). The exception is the localized 
increase in aerosol burden (Canadian coast) driven by increases in sea salt 
aerosols (warm years reduce sea ice extent in this area). The warm phase/cold 
phase SST pattern simulated by the model (d) agrees well with the observed 
change (e). Vertical axes show latitude; horizontal axes show longitude. 


aerosol optical depth in maritime regions, not all parameterizations of 
aerosol indirect effects do so (Fig. 2e in ref. 29). 

We have shown that volcanic and aerosol processes can drive pro- 
nounced multidecadal variability in historical NASST, which leads to 
improved (for the early twentieth century) or reproduces (for the later 
period) the observed historical trends. In these simulations, it is the 
inclusion of aerosol indirect effects that allows us to capture the mag- 
nitude and the temporal and spatial structure of SST variability. Our 
results show that volcanoes and, crucially (from a policy and climate 
impact perspective), anthropogenic emissions of aerosols can drive 
NASST variability resembling that which is observed. This work suggests 
that we need to reassess the current attribution to natural ocean vari- 
ability of a number of prominent past climate impacts linked to NASSTs, 
such as Sahel drought. 
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Figure 4 | Magnitude and origin of forced changes in net surface shortwave 
radiation. a, Ensemble-mean time evolution of surface shortwave in the North 
Atlantic region (red; see also Fig. 2b) and the surface shortwave from a parallel 
simulation in which the SST were held fixed at climatological values from 1860 
(blue). The comparison shows that these shortwave changes are externally 
forced. These long-term trends in surface shortwave match the changes observed 
in a snapshot experiment (2000 aerosol emissions, 1860 SST; blue asterisk), 
indicating that the underlying trend is consistent with that expected from aerosol 
changes alone. b, Spatial patterns of surface shortwave forcing arising from the 
2000 direct effect (left) and the first indirect effect (right) of aerosols 
(Supplementary Information), which illustrate the dominant role of indirect 
effects in the total forcing and the spatial distribution. c, Detrended time series of 
aerosol optical depth (black) and cloud-top effective radius (green) from the 
coupled simulations, which are indicators of the temporal evolution of direct 
and, respectively, indirect effects. Although the variations in effective radius are 
largely in phase with those in optical depth, there is greater divergence (implying 
a larger role for indirect effects) in the early historical period. 


METHODS SUMMARY 

The climate model used in this study is HadGEM2-ES”™. This model is notable for 
the number of climate-biogeochemical interactions that are interactively calcu- 
lated rather than specified in advance. Of relevance to this work, HadGEM2-ES 
models the supply of oxidants, an important component for aerosol formation, 
and mineral dust aerosols interactively, and yields improved predictions of bio- 
mass and carboniferous aerosol properties. Source terms for natural aerosols (or 
precursors) and mineral dust are also modelled interactively. Each simulation in 
the ensemble is forced with driving data (greenhouse gases, aerosols, volcanoes, 
land use and solar changes) based on historical data sets compiled for CMIP5 
simulations. These data sets and their implementation within this model are 
extensively documented in ref. 26. Volcanic forcing is prescribed in latitudinal 
bands. Over the North Atlantic, the magnitude of optical depth changes are pre- 
scribed individually for the bands spanning 0-30° N and 30-90° N, capturing the 
differences between tropical and extratropical volcanoes. Individual members of 
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the ensemble were initiated from a control simulation using start points located 
26 


50 yr apart”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


The central finding of this paper (that changes in volcanic and aerosol forcing are 
capable of driving variability in NASSTs much like that observed) is based on an 
ensemble of four HadGEM2-ES historical simulations (Methods Summary). The 
following parallel simulations were also used. 

e An ensemble of three HadGEM2-ES simulations (parallel to the first three 
members of the all-forcings ensemble) with no changes in aerosol emissions. 
Results of this experiment are shown in Fig. 2 and Supplementary Fig. 2. These 
simulations used identical driving data to the standard historical ensemble, pre- 
scribing changes in emissions and concentrations based on the CMIP5 historical 
data sets. The exception is the anthropogenic aerosol emissions, which (along with 
the surface chemistry and consequent contribution to aerosol oxidation) were kept 
constant at their 1860 values. This ensemble provides a comparison with historical 
NASSTs where the historical changes in aerosol emissions did not take place. 

e A parallel historical simulation of HadGEM2-ES in which the annual cycle of 
global SSTs are held at 1860 values, as calculated from the pre-industrial control 
simulation. Results of this experiment are shown in Fig. 4. This simulation used 
identical driving data to the standard historical ensemble and is designed to 
characterize the evolving nature of the historical forcings. Radiative forcing is 
the impact on the radiative balance resulting from a change or changes in the 
atmospheric constituents, or other external change (such as solar), before any SST- 
driven feedback on that change. This parallel run with fixed SSTs provides 
information on the shortwave changes (or the changes in any other radiative 
component) due directly to changes in atmospheric concentrations, explicitly 
removing any contribution arising from SST-driven feedbacks. 
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e As a companion experiment to the fixed-SST historical simulation (described 
above), snapshot experiments were carried out to assess the impact of aerosol changes 
alone between the beginning and end of this simulation. This set-up used SSTs, and all 
forcings other than anthropogenic aerosol emissions, held at their 1860 values. 
Anthropogenic aerosol emissions were set to either their 2000 or 1860 values. The 
comparison of the two allows an estimate to be made of the radiative changes 
(including shortwave) between 1860 and 2000, rather than, for example, as a feedback 
to the warming SSTs. The result of this experiment is shown in Fig. 4 alongside the 
fixed-SST historical simulation (in which other atmospheric consistent also varied). 
e Three snapshot experiments in which the model code calculating the instant- 
aneous radiation balance was run twice at each forward step of the model—once 
with the relevant aerosol process included and one without—to quantify sepa- 
rately the radiative impact of the direct and the first indirect effects of aerosols. The 
three snapshot experiments used aerosol emissions from 1950, 1980 and 2000 and 
calculated the surface shortwave radiative impact (forcing) of aerosols in those 
three years. These are presented in Supplementary Fig. 1 and the 2000 snapshot is 
included in Fig. 4. The value of these runs is that they allow the radiative impacts of 
these two aerosol effects to be compared 

The HadGEM2-ES model, like others before it?, captures SST variability in 
simulations unforced by external factors (fig. 20 in ref. 25), which, in unforced 
pre-industrial simulations, are strongly correlated with variability in Atlantic 
meridional overturning circulation (0.65, using 10-yr smoothing). However, var- 
iations in circulation are less important in the ensemble-mean variability of the 
historical simulations, where shortwave forcing dominates. This is discussed 
further in the section on shortwave changes and the surface heat budget in 
Supplementary Information. 
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Adaptation at the output of the chemotaxis signalling 


pathway 


Junhua Yuan!, Richard W. Branch!, Basarab G. Hosu! & Howard C. Berg! 


In the bacterial chemotaxis network, receptor clusters process 
input’, and flagellar motors generate output*. Receptor and motor 
complexes are coupled by the diffusible protein CheY-P. Receptor 
output (the steady-state concentration of CheY-P) varies from cell to 
cell’. However, the motor is ultrasensitive, with a narrow operating 
range of CheY-P concentrations®. How the match between receptor 
output and motor input might be optimized is unclear. Here we 
show that the motor can shift its operating range by changing its 
composition. The number of FliM subunits in the C-ring increases 
in response to a decrement in the concentration of CheY-P, increas- 
ing motor sensitivity. This shift in sensitivity explains the slow 
partial adaptation observed in mutants that lack the receptor 
methyltransferase and methylesterase”* and why motors show 
signal-dependent FliM turnover’. Adaptive remodelling is likely 
to be a common feature in the operation of many molecular 
machines. 

The chemotaxis signalling pathway allows bacterial cells to sense 
and respond to changes in concentrations of chemical attractants or 
repellents’. Binding of chemicals by receptors modulates the activity 
of an associated histidine kinase, CheA, thereby changing the level of 
phosphorylation of the response regulator, CheY. CheY-P binds to 
FliM, a component of the switch complex at the base of the flagellar 
motor and modulates the direction of motor rotation. A phosphatase, 
CheZ, dephosphorylates CheY-P. The chemotaxis pathway is well 
known for its high gain*’°"’, wide dynamic range’’’* and robust 
adaptation®’*, mediated by receptor methylation and demethylation 
(by CheR and CheB). 

The output of the chemotaxis pathway, the flagellar motor, is ultra- 
sensitive to the intracellular concentration of CheY-P, with a Hill 
coefficient of about 10, imposing a narrow operational range for 
[CheY-P]*®. Whereas precise adaptation is a robust property of the 
chemotaxis pathway for certain attractants, for example aspartate, 
the steady-state concentration of CheY-P is not®. Given cell-to-cell 
variations in the concentration of CheY-P and the fact that different 
cells can maintain their chemotactic sensitivity”, it has been suggested 
that cells might have additional molecular mechanisms to adjust the 
CheY-P concentration around the operational value of approximately 
3 uM®. One possibility is a feedback mechanism that allows a cell to 
adjust its kinase activity in response to motor output. This mechanism 
would increase the kinase activity if cells only ran, and would decrease 
the kinase activity if cells only tumbled. In earlier work, we looked for 
such a mechanism by monitoring the kinase activity with a fluor- 
escence resonance energy transfer (FRET) technique’* while jamming 
flagellar bundles with an anti-filament antibody. Stopping motors had 
no effect on kinase activity’®. 

Here we report that the motor itself adapts, shifting its response 
function according to the steady-state concentration of CheY-P. It 
does this by increasing the complement of FliM when the concentra- 
tion of CheY-P is low. Motor remodelling is well known for the stator 
elements MotA and MotB, which if defective, can be replaced by wild- 
type protein, as evidenced by stepwise increments in motor torque'””’. 
Such exchange also has been visualized by total internal reflection 


fluorescence (TIRF) microscopy of green fluorescent protein (GFP)- 
labelled protein”’, and a similar technique has been used to demon- 
strate FliM’ and FIiN*® exchange in cells containing CheY-P. The 
present work addresses the functional consequences of FliM exchange. 
We studied cheR cheB cells, which are defective in methylation and 
demethylation, and monitored motor and kinase responses to step- 
addition of the non-metabolizable attractant o«-methylaspartate 
(MeAsp), using bead*' and FRET” assays. These experiments cannot 
be done with wild-type cells because their adaptation to aspartate is 
robust, so that the steady-state concentration of CheY-P does not 
change. Motor adaptation occurs on a minute rather than on a second 
timescale and does not play a direct role in sensing temporal gradients. 
Instead, it helps to match the operating point of the motor to the 
output of the chemotaxis receptor complex, obviating the requirement 
for fine-tuning of that output. 
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Figure 1 | Motor responses to stepwise addition of chemical attractants 
monitored by the bead assay. The attractants were applied at the times 
indicated by the arrows. Error bounds for standard errors of the mean are 
shown as dotted lines. a, Averaged responses of seven cheR cheB cells (J¥35 
carrying pKAF131) to 1mM MeAsp (weak attractant). b, Averaged responses 
of four cheR cheB cheZ cells (JY32 carrying pVS7 and pKAF131) to 2mM 
MeAsp + 0.5 mM L-serine (strong attractant). 
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Using a bead assay, we found partial adaptation in cheR cheB cells 
within 1 min following the initial response, Fig. 1a, which shows the 
averaged responses of seven motors on different cells to stepwise addi- 
tion of 1 mM MeAsp. These results are similar to those obtained previ- 
ously with tethered cells’*. A recent model suggests that partial 
adaptation might be due to dynamic localization of CheZ”. To test 
this hypothesis, we repeated the bead experiments using cheR cheB 
cheZ cells. The results were essentially the same; Fig. 1b shows the 
averaged responses of four motors on different cells of a cheR cheB 
cheZ strain to stepwise addition of 2mM MeAsp + 0.5 mM L-serine 
(a stronger stimulus needed because of the lower sensitivity of cheZ 
strains). So CheZ is not required for this partial adaptation. 

CheY-P concentrations were monitored by measuring FRET 
between cyan fluorescent protein (CFP)-conjugated CheZ (CheZ- 
CFP) and yellow fluorescent protein (YFP)-conjugated CheY (CheY- 
YFP). We measured responses in cheR cheB cells to stepwise addition of 
1mM MeAsp, Fig. 2a. The response shown in Fig. 2a is similar to that 
obtained previously’. No adaptation is apparent. To rule out possible 
complications due to CheZ oligomerization”’, we also measured CFP- 
FliM/CheY — YFP FRET” in cheR cheB cells following stepwise addition 
of 2mM MeAsp + 0.5 mM L-serine (a stronger stimulus needed because 
of the lower sensitivity of CFP-FliM/CheY—YFP FRET), as shown in 
Fig. 2b. No adaptation is apparent in either panel of Fig. 2, so the partial 
adaptation shown in Fig. 1 does not occur upstream of CheY-P. It must 
occur at the level of the flagellar motor. 

Clockwise (CW) biases of motors were measured before addition of 
attractant, immediately after addition of attractant, and after time was 
allowed for partial adaptation. We focused on motors with pre- 
stimulus CW biases around 0.8 (ranging from 0.70 to 0.95). Owing 
to cell-to-cell variation, the lowest biases following stimulation ranged 
from 0 to 0.75. The concentration of CheY-P in a given cell was 
estimated from its CW bias at the time of the lowest bias, using the 
response curve measured previously®, shown by the red line in Fig. 3 
(Hill coefficient Nyy = 10.3, CheY-P dissociation constant K = 3.1 uM). 
Then the CW bias found after that cell had adapted was plotted as a 
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Figure 2 | FRET responses (Y/C ratio) of cheR cheB cells to stepwise 
addition of chemical attractants. The attractants were added at the times 
indicated by the arrows. a, CheY — YFP/CheZ—CFP FRET responses to 1 mM 


MeAsp (weak attractant). b, CheY —YFP/CFP-FliM FRET responses to 2 mM 
MeAsp + 0.5 mM L-serine (strong attractant). 
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function of this concentration, as shown by the blue data points in Fig. 3. 
The measurements were carried out for 49 motors on different cheR 
cheB cells with stepwise addition of 0.5 or 1mM MeAsp. Following 
adaptation, the relationship between the CW bias and the concentration 
of CheY-P shifted to lower concentrations of CheY-P, increasing motor 
sensitivity to CheY-P. 

How the motor accomplishes this shift is intriguing. We sought to 
explain the shift in the motor response curve by using a Monod- 
Wyman-Changeux (MWC) type model”*”*, which has been used previ- 
ously to explain the motor switching kinetics”. In this model, the C-ring 
is considered to be an allosteric switch, stochastically switching between 
two conformational states, counterclockwise (CCW) and CW, with 
N independent binding sites for CheY-P, corresponding to N units of 
FIiM in the C-ring. The CW state has a higher affinity to CheY-P than 
the CCW state. The CW bias of the motor is given by Bow = 
(1 + [CheY-P]/K)*/((1 + [CheY-P]/K)% + L(1 + [CheY-P]/(KC))%), 
where L is the ratio of the probability that the motor is in the CCW state 
to the probability that it is in the CW state in the absence of CheY-P, K 
is the CheY-P dissociation constant for the CW state, and C is the ratio 
of dissociation constants for the CCW and CW states, respectively”®. 
With reasonable values for the parameters, for example, N = 34, 
L=10’ and K = 3.1 |1M®”’, the model can be fit to the ultrasensitivity 
data of ref. 6 with a best-fit value of C of 4.1, as shown by the red curve 
in Fig. 3. With these values for the parameters L, Kand C, and assuming 
that the number of FliM units N varies with [CheY-P], we can fit the 
data measured for the adapted motor using the MWC model with 
N= Nyy + «([CheY-P] — 2.7), where N is written as a Taylor expan- 
sion about the average value of [CheY-P], 2.7 1M. We obtain a two- 
parameter fit with N,y = 36 and « = —1.2 1M _', shown by the green 
curve in Fig. 3. The average value of N has increased from 34 to 36. 
Sensitivity of an MWC complex is known to increase with N for fixed 
values of L, K and C (ref. 28). Equivalently, the motor bias versus 
[CheY-P] curve shifts to smaller [CheY-P] with larger N as shown in 
Fig. 3. Intuitively, the fact that increasing the number of FIiM units 
causes an increase in CW bias can be understood by considering the 
energetics of the switch. Each CheY-P binding decreases the energy 
level of CW state by a specific amount. With the values for parameters 
L, K and C fixed, increasing the number of FIiM, that is, CheY-P 
binding sites, increases the number of CheY-P bound to the motor. 


CW bias 


CheY-P concentration (uM) 


Figure 3 | CW bias as a function of CheY-P concentration. The red curve is 
for pre-stimulus wild-type motors, as measured in ref. 6 and fit with an MWC 
model with 34 FIiM units”*. The blue dots are data for motors that have partially 
adapted to stepwise addition of attractant, with a two-parameter fit to the 
MWC model with N,, = 36 FliM units shown by the green curve; see text. 
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This decreases the energy level of the CW state, thereby increasing the 
CW bias. 

To test directly for this increase of the number of FliM units, we 
fused YFP to the carboxy terminus of FliM and monitored the fluor- 
escence intensity of single motors using TIRF microscopy. To minimize 
shifts in motor position, we tethered a cheR cheB strain that lacks the 
flagellar filament to glass via single hooks with anti-hook antibody. 
Changes in fluorescence were measured upon addition of 2mM 
MeAsp + 0.5 mM L-serine, which should saturate the chemoreceptors 
and eliminate CheY-P, or of 1 mM MeAsp, which should simply reduce 
the concentration of CheY-P. The results are shown in Fig. 4. Figure 4a 
is the averaged response of 20 motors to the strong attractant, and 
Fig. 4b is the averaged response of 22 motors to the weak attractant. 
In either case, the fluorescence intensity increased following the addi- 
tion of attractant on a time scale consistent with partial adaptation, by 
a larger amount for the stronger attractant. We compensated for 
fluorescence bleaching by subtracting a control curve (Fig. 4c) and 
fitting the results to a model in which the FliM off rate decreases when 
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Figure 4 | Changes in single-motor FliM—YFP fluorescence intensities in 
cheR cheB cells tethered by hooks and stimulated by addition of attractant 
(at time f, arrow). a-—c, The intensity for a given motor was normalized by its 
intensity at time 0, intensities for different motors were averaged, and the fit to 
the control of panel c was subtracted. Fits with the model are shown in magenta 
and the parameters for these fits are given. Error bars are standard errors of the 
mean. a, Responses of 20 motors to addition of 2mM MeAsp + 0.5mM 
L-serine. b, Responses of 22 motors to addition of 1 mM MeAsp. c, Responses of 
15 motors without addition of attractant, fit with an exponential decay function 
plus a constant. 
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the CheY-P concentration decreases; see Methods. Steady state is 
reached when Nkog= (M —N)kon, where N is the number of FliM 
molecules in the motor, and M is the maximum number of FliM bind- 
ing sites in the motor. The fits are shown in magenta and the final values 
for N are given in each panel (assuming an initial value of 34). These 
values agree with those presented in Fig. 3. So the motor increases the 
number of FliM units as it partially adapts to a decrement in the con- 
centration of CheY-P. By doing so, it increases its dynamic range. 

We eliminated the concern that binding of CheY-P to FliM—YFP 
might be different than binding to wild-type FliM by using the bead 
assay to compare the biases, switching rates and speeds for motors 
of cheR cheB cells expressing FliM—YFP or wild-type FIiM: the biases 
were 61+16% or 58+ 19%, the switching rates were 3.8 + 1.2 
or 3.4£1.0s ', and the speeds were 50.4+ 8.4 or 51.5+ 8.2 Hz, 
respectively. 

The motor adaptation mechanism observed here is related to the 
turnover of motor C-ring components discovered recently””®, where 
exchange of FIiM was found to be signal-dependent and exhibited a 
similar timescale’. The detailed mechanism should involve changes in 
FliM on/off rates dependent upon either CheY-P binding or rates of 
motor switching. As noted earlier, the timescale for motor adaptation 
(1 min) is much slower than that for receptor methylation/demethyla- 
tion (1s), which enables cells to make rapid temporal comparisons; 
thus, motor adaptation does not play a critical role in that aspect. 
Instead, it helps match the operating point of the motor to the output 
of the chemotaxis receptor complex. 


METHODS SUMMARY 


All strains used in this study were derivatives of Escherichia coli K12 strain RP437. 
Cells were grown at 33 °C in 10 ml T-broth supplemented with the appropriate 
antibiotics and inducers to an Agoonm Of 0.45 to 0.50. Cells were collected by 
centrifugation (10 min at 1,300g), washed twice in 10ml of motility medium 
(10mM potassium phosphate/0.1mM EDTA/1uM methionine/10 mM lactic 
acid, pH 7.0), and resuspended in 10 ml of this medium. They were used imme- 
diately for experiments or stored at 4 °C for up to 2 h. All experiments were carried 
out with a custom-made flow chamber at room temperature. 

For the bead assay, cells were sheared to truncate flagella, and 1.0-j1m-diameter 
polystyrene latex beads were attached to the filament stubs. Rotation of the bead 
was monitored with a laser dark-field setup described previously”’. Rotational 
velocity as a function of time was determined for each motor and smoothed with 
a 25-point running average. CW bias was calculated over a 20-s interval every 2s, 
leading to a plot of CW bias versus time. 

FRET measurements of bacterial populations were carried out as described 
previously*®. 

For TIRF measurements, cells were tethered to the bottom window of the flow 
chamber by single hooks using anti-FlgE antibody, following a protocol adapted 
from ref. 21. The fluorescence intensity of the motors was monitored with a TIRF 
microscope (Nikon Eclipse Ti-U), and images were recorded with a back- 
illuminated, cooled (—55 °C), electron-multiplying CCD camera (DV887ECS- 
BV, Andor Technology). Image analysis of the motor spots was carried out using 
a Gaussian mask method described previously”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Strains and plasmids. All strains used in this study are derivatives of E. coli K12 
strain RP437 (ref. 31): JY32 (cheR cheB cheY cheZ fliC), JY35 (cheR cheB fliC), 
RP2893 (A2206(tap-cheZ))"', J¥37 (cheR cheB cheY fliM), and JY40 (cheR cheB 
fliM fliC). The fliM-eyfp*?°* fusion with a 3 X glycine linker was cloned into 
pTrc99A* under an isopropyl-f-D-thiogalactoside (IPTG)-inducible promoter, 
yielding pRWB7. pDFB72 carrying wild-type fliM on pTrc99A was a gift from 
D. Blair. pVS7 carrying wild-type cheY on a pBAD18-Kan” vector was a gift from 
V. Sourjik. pVS18 carrying cheY-eyfp on pTrc99A, pVS31 carrying ecfp-fliM on 
pBAD33 (ref. 33), and pVS54 carrying cheZ-ecfp on pBAD33, were described 
previously'!**. pKAF131 carrying the sticky fliC allele under control of the native 
fliC promoter, was described previously™*. For studies of cheR cheB cells with the 
bead assay, JY35 carrying pKAF131 was used. For studies of cheR cheB cheZ cells 
with the bead assay, JY32 carrying pVS7 and pKAF131 was used. For CheZ—CFP/ 
CheY—YFP FRET studies of cheR cheB cells, RP2893 carrying pVS18 and pVS54 
was used. For CFP-FliM/CheY—YFP FRET studies of cheR cheB cells, JY37 
carrying pVS18 and pVS31 was used. For the TIRF studies of single motors, JY40 
carrying pRWB7 was used. For comparison of motors with wild-type FliM and 
FliM— YEP, JY40 carrying pDFB72 and pKAF131, and JY40 carrying pRWB7 and 
pKAF131 were used. Cells were grown at 33 °C in 10 ml T-broth (1% tryptone and 
0.5% NaCl) supplemented with the appropriate antibiotics (ampicillin: 100 pg ml‘, 
kanamycin: 50 jig ml~’, chloramphenicol: 34 \1gml~*) and inducers (0.005% 
arabinose for the bead assay, 0.01% arabinose and 50 uM IPTG for the FRET 
studies, 100 uM IPTG for the TIRE studies) to an Agoo nm Of 0.45 to 0.50. Cells were 
collected by centrifugation (10 min at 1,300g), washed twice in 10 ml of motility 
medium (10 mM potassium phosphate/0.1 mM EDTA/1 11M methionine/10 mM 
lactic acid, pH7.0), and resuspended in 10 ml of this medium. They were used 
immediately for experiments or stored at 4°C for up to 2h. 

Bead assay and data analysis. Cells were sheared to truncate flagella by passing 
1 ml of the washed-cell suspension 50 times between two syringes equipped with 
23-gauge needles and connected by a 7-cm length of polyethylene tubing (0.58 mm 
internal diameter, catalogue no. 427411; Becton Dickinson). The sheared cell 
suspension was centrifuged and resuspended in 0.5ml of motility medium. 
50 pl of this suspension was placed on a glass coverslip coated with poly-L-lysine 
(0.01%, catalogue no. P4707; Sigma) and allowed to stand for 2 min, then 5 ll of 
1.0-um-diameter polystyrene latex beads (2.69%, catalogue no. 07310; 
Polysciences) was added, mixed by gentle pipetting, and allowed to stand for 
2 min. The coverslip was installed as the top window of a flow chamber*’ and 
rinsed with motility medium. The chamber was kept under a constant flow of 
buffer (400 ,l min” ') by a syringe pump (Harvard Apparatus). Rotation of the 
bead was monitored with a laser dark-field setup described previously”. Outputs 
from the photomultiplier tubes were directly coupled to an eight-pole low-pass 
Bessel filter (3384, Krohn-Hite) with a cutoff frequency of 200 Hz and sampled at 
500 Hz using LabView. For each experiment, the rotation of the bead was monitored 
for 70s, then the medium was switched to attractants, and the rotation was 
monitored further for about 400 s. Rotational velocity as a function of time was 
determined for each motor as described previously” and smoothed with a 25- 
point running average. CW bias was calculated over a 20-s interval every 2s, 
leading to a plot of CW bias versus time. 

FRET measurements. FRET measurements of bacterial populations were carried 
out as described previously'**’, except that the epifluorescent illumination was 
provided by a LED white light source (MCWHL2-C3, Thorlabs) through an 
excitation bandpass filter (FF01-438/24-25, Semrock). For each experiment, 
4 ml of the washed-cell suspension was centrifuged and resuspended in 55 ll of 
motility medium, which was placed on a glass coverslip coated with poly-L-lysine 
(0.1%, catalogue no. P8920, Sigma) and allowed to stand for 5 min. The coverslip 
was installed as the top window of a flow chamber® and rinsed with motility 
medium. The chamber was kept under a constant flow of buffer (500 ull min’ '). 
Epifluorescent emission was split into donor (cyan, C) and acceptor (yellow, Y) 
channels and collected by photon-counting photomultipliers (H7421-40, 
Hamamatsu). Signal intensities of these two channels were recorded by a computer 
running LabView, and the ratio between them (R = Y/C) provided an indicator of 
FRET activity. The FRET traces were smoothed with a median filter of rank 3. 
TIRF measurements, data analysis and fits to the model. Cells were tethered to 
glass by hooks using anti-FlgE antibody, following a protocol adapted from ref. 21: 
350 yl of the washed-cell suspension was centrifuged and resuspended in 100 pl of 
motility medium; 10 pl of anti-FigE antibody (0.1 mg ml ') was added, and the 
mixture was incubated at 23 °C for 25 min. The antibody-treated cells were washed 
twice with 350 pl of motility medium and gently resuspended in 55 ul of motility 
medium. This cell suspension was placed on the bottom coverslip of a flow chamber 
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(which was washed earlier with ethanol and distilled H,O and air-dried for 2 h) and 
allowed to stand for 15 min. The top coverslip of the flow chamber was then installed 
and rinsed with motility medium. The chamber was kept under a constant flow of 
buffer (100 pl min‘). For each experiment, the flow was switched to attractants at 
1 min before the start. It took about 1 min and 30 s for the attractant to reach the flow 
chamber and about 30s to replace the medium, so effectively the medium reached 
the cells at between 30 s to 60 s after the start of each experiment. Only stably rotating 
and switching motors were monitored. The motors observed usually started with 
high CW bias; upon addition of strong attractant, they changed to exclusively CCW 
and remained 100% CCW throughout the observation time; upon addition of weak 
attractant, their CW bias reduced and later partially recovered; upon addition of 
motility medium, their CW bias did not change. The fluorescent intensity of the 
motors was monitored with a TIRF microscope (Nikon Eclipse Ti-U), and images 
were recorded at 65 nm per pixel with a back-illuminated, cooled (—55 °C), electron- 
multiplying CCD camera (DV887ECS-BV, Andor Technology). The camera was 
controlled by Andor Solis software running on a desktop computer. Image acquisi- 
tion was under Andor Solis ‘kinetic’ mode, with 200 ms exposures every 6s for 50 
exposures for each motor. The laser illumination was blocked between exposures. 

Images of the motor spots showed radially symmetric and approximately 
Gaussian intensity profiles. The width of these spots was about 5 pixels 
(325 nm). The fluorescent intensity centroid for each motor was calculated using 
a Gaussian mask method described previously'*”*. Specifically, an initial estimate 
was made based on the peak pixel intensity, a 9 X 9 pixel region of interest (ROI) 
was defined centring on the initial motor centroid, and the motor centroid was 
calculated as follows: first, a circular motor mask of diameter 300 nm was applied 
to the ROI centring on the current motor centroid. Second, pixel intensities within 
the motor mask were multiplied by a radially symmetric two-dimensional 
Gaussian mask of fixed half-width 170 nm, and a revised estimate for the motor 
centroid was calculated using a weighted average. Lastly, the previous two steps 
were iterated either 150 times or until the motor mask began clipping the side of 
the ROI. We also calculated the motor centroid with a two-dimensional Gaussian 
fitting, and both methods yielded comparable results. After the centroid was 
calculated, the background intensity was defined as the mean pixel intensity within 
the ROI but external to the final motor mask, and the motor intensity was calcu- 
lated as the sum of all pixel intensities within the motor mask after subtraction of 
the background intensity from each pixel value. 

The model assumes that CheY-P binding destabilizes FliM, so that when [CheY- 
P] suddenly decreases due to addition of attractant, kog (the off rate of each FliM 
unit) decreases, while k,, remains the same. When the number of FIiM units (N) in 
the C-ring reaches a new steady state, Nkog = (M — N)ko»» where M is the maximum 
number of FliM binding sites in a motor. The pre-stimulus N is assumed to be 34. 
During the response to the attractant step, the increment of N satisfies 
dn = ((M — (n+34))kon — (n+34)kog)dt, while the increment of the normalized 
motor intensity satisfies df= dn/a — /fdt, where a is the normalization factor that 
converts the number of FliM units to fluorescence intensity, and / is the fluorescence 
bleaching rate obtained by fitting the control curve (Fig. 4c). Solving these two 
differential equations with the initial conditions: n(0) = 0, f(0) = 0 leads to: 


f (konM = 34(kon F Kott)) 

Akon + Kote = d) 

(N = 34)(Kon a kor) 
akon + Kote _ 2) 


(e 7 e (Kon + Kost )t) 


(e7 e~ (Kon + kort Mt) 


If the time of arrival of the attractant at the cell is fo instead of 0, change t in the above 
equations to t — fo. 
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Trans-synaptic Teneurin signalling in neuromuscular 
synapse organization and target choice 


Timothy J. Mosca!, Weizhe Hong"*, Vardhan S. Dani!, Vincenzo Favaloro! & Liqun Luo! 


Synapse assembly requires trans-synaptic signals between the pre- 
and postsynapse’, but our understanding of the essential organiza- 
tional molecules involved in this process remains incomplete’. 
Teneurin proteins are conserved, epidermal growth factor (EGF)- 
repeat-containing transmembrane proteins with large extracellular 
domains’. Here we show that two Drosophila Teneurins, Ten-m and 
Ten-a, are required for neuromuscular synapse organization and 
target selection. Ten-a is presynaptic whereas Ten-m is mostly post- 
synaptic; neuronal Ten-a and muscle Ten-m form a complex in vivo. 
Pre- or postsynaptic Teneurin perturbations cause severe synapse 
loss and impair many facets of organization trans-synaptically and 
cell autonomously. These include defects in active zone apposition, 
release sites, membrane and vesicle organization, and synaptic trans- 
mission. Moreover, the presynaptic microtubule and postsynaptic 
spectrin cytoskeletons are severely disrupted, suggesting a mech- 
anism whereby Teneurins organize the cytoskeleton, which in turn 
affects other aspects of synapse development. Supporting this, 
Ten-m physically interacts with a-Spectrin. Genetic analyses of 
teneurin and neuroligin reveal that they have differential roles that 
synergize to promote synapse assembly. Finally, at elevated 
endogenous levels, Ten-m regulates target selection between specific 
motor neurons and muscles. Our study identifies the Teneurins as a 
key bi-directional trans-synaptic signal involved in general synapse 
organization, and demonstrates that proteins such as these can also 
regulate target selection. 

Vertebrate teneurins are enriched in the developing brain*”, localize 
to synapses in culture’, and pattern visual connections’. Both 
Drosophila Teneurins, Ten-m and Ten-a, function in olfactory synaptic 
partner matching® and were further identified in neuromuscular junc- 
tion (NMJ) defect screens””°®, with Ten-m also affecting motor axon 
guidance’. We examine their roles and the underlying mechanisms 
involved in synapse development. 

Both Ten-m and Ten-a were enriched at the larval NMJ (Fig. 1a and 
Supplementary Fig. 1a). Ten-a was detected at neuronal membranes: 
this staining was undetectable beyond background in ten-a null 
mutants (Supplementary Fig. 1b) and barely detectable after neuronal 
ten-a RNA interference (RNAi; Supplementary Fig. 1c), indicating that 
Ten-a is predominantly presynaptic. Partial co-localization was 
observed between Ten-a and the periactive zone marker Fasciclin 2 
(ref. 12) as well as the active zone marker Bruchpilot’’ (Fig. 1b, c), 
suggesting that Ten-a is localized to the junction between the periactive 
zone and the active zone. Ten-m appeared strongly postsynaptic and 
surrounded each bouton (Fig. la and Supplementary Fig. la, d). 
Muscle-specific ten-m RNAi eliminated the postsynaptic staining, 
but uncovered weak presynaptic staining (Supplementary Fig. le) that 
ubiquitous ten-m RNAi eliminated (Supplementary Fig. 1f). Thus, the 
Ten-m signal was specific and, although partly presynaptic, enriched 
postsynaptically. Consistently, muscle Ten-m colocalized extensively 
with Dlg (Fig. 1d) and completely with «-Spectrin (Fig. le) and is thus 
probably coincident with all postsynaptic membranes. 


The localization of Ten-a and Ten-m suggested their trans-synaptic 
interaction. To examine this, we co-expressed Myc-tagged Ten-a in 
nerves using the Q system’* and haemagglutinin (HA)-tagged 
Ten-m in muscles using GAL4. Muscle Ten-m was able to co- 
immunoprecipitate nerve Ten-a from larval synaptosomes (Fig. 1f), 
suggesting that the Teneurins form a heterophilic trans-synaptic 
receptor pair at the NMJ. 

To determine Teneurin function at the NMJ, we examined the ten-a 
null allele and larvae with neuron or muscle RNAi of ten-a and/or ten-m. 
Following such perturbations, bouton number and size were altered: the 
quantity was reduced by 55% (Fig. 2a—c, g and Supplementary Fig. 2) 


*~ |IB: Myc (Ten-a) 


IB: HA (Ten-m) 


a IB: Brp (Input) 


Figure 1 | Teneurins are enriched at and interact across Drosophila 
neuromuscular synapses. a-e, Representative single confocal sections of 
synaptic boutons stained with antibodies against Ten-a (red) or Ten-m (green), 
horseradish peroxidase (HRP) to mark the neuronal membrane (blue), and a 
synaptic marker as indicated. a, Ten-a is associated with presynaptic 
membranes and Ten-m largely with the surrounding postsynapse (a). b, c, Ten- 
a shows limited co-localization with the periactive zone marker Fasciclin 2 
(b), and Bruchpilot (Brp), an active zone marker (c). d, e, Ten-m co-localizes 
with, and extends beyond, Dlg (d) and completely co-localizes with muscle 
a-Spectrin (e). f, immunoblots (IB) of larval synaptosomes expressing neuronal 
Flag-Myc-tagged Ten-a (N) and muscle Flag~HA-tagged Ten-m (M) and 
immunoprecipitated (IP) using antibodies to HA. Ten-a is detected in the pull- 
down, indicating that nerve Ten-a and muscle Ten-m interact across the NMJ. 
This is not seen in control lanes. Owing to low expression, neither transgene 
product is detectable in input lysates, which are enriched in Brp. 

Scale bar, 5 um. 


1Department of Biology, Howard Hughes Medical Institute, Stanford University, Stanford, California 94305, USA. 


*These authors contributed equally to this work. 


12 APRIL 2012 | VOL 484 | NATURE | 237 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


Brp DGluRIll HRP 


Control 


Figure 2 | Teneurins affect the structure and function of the neuromuscular 
synapse. a-f, Representative NMJs stained with antibodies to HRP. Muscle- 
specific ten-m RNAi (M-IR) and lots of ten-a decrease bouton number. 
Neuronal (Ten-a N) but not muscle (Ten-a M) restoration of ten-a expression 
rescues this phenotype. These defects resemble n/g1 mutants and are enhanced 
in ten-a nlg1 double mutants. g, Quantification of bouton number. 

h-j, Representative NMJs stained with antibodies to Brp (green), the glutamate 
receptor subunit DGluRIII (red) and HRP (blue). In control larvae (h), Brp and 
DGluRIII puncta properly appose. teneurin perturbations (i, j) disrupt this 
active zone (yellow arrowhead) and glutamate receptor apposition (white 
arrowhead). k, Quantification of unapposed active zone/glutamate receptor 
pairs. For each quantification, n = 8 larvae, 16 NMJs. l-q, Transmission 
electron microscopy of active zone T-bars (asterisks) in control larvae (I) and 


and the incidence of large boutons markedly increased (Supplemen- 
tary Fig. 2k). Both changes indicate impaired synaptic morphogenesis. 
The reduction in bouton number was probably cumulative through 
development, as it was visible in first instar ten-a mutants and persisted 
(Supplementary Fig. 2k). In the ten-a mutant, bouton morphogenesis 
was rescued by restoring Ten-a expression in neurons, but not muscles 
(Fig. 2d, g and Supplementary Fig. 2). Neuronal Ten-m overexpres- 
sion could not substitute for the lack of Ten-a, revealing their non- 
equivalence (Supplementary Fig. 2e, 1). Neuronal knockdown of Ten-a 
or Ten-m resulted in fewer synaptic boutons (Supplementary Fig. 2f- 
h, 1), indicating that both have a presynaptic function, although pre- 
synaptic Ten-a has a more predominant role (Supplementary Fig. 21). 
Moreover, knocking down postsynaptic Ten-m in the ten-a mutant 
did not enhance the phenotype (Fig. 2g). Thus, presynaptic Ten-a 
(and, to a lesser extent, Ten-m) and postsynaptic Ten-m are required 
for synapse development. 

teneurin perturbation also caused defects in the apposition between 
presynaptic active zones (release sites) and postsynaptic glutamate 
receptor clusters'> (Fig. 2h and Supplementary Fig. 3): up to 15% of 
the active zones/receptor clusters lacked their partner compared to 
1.8% in controls (Fig. 2h-k). Under electron microscopy, active zones 
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ten-a mutants (m-q) showing double T-bars (m), detached T-bars 

(n), misshapen T-bars (0), membrane ruffling (p, waved arrows) and T-bars 
facing contractile tissue (q). Some images show multiple defects. r, Distribution 
of T-bar defects as a percentage of the total T-bars. N-IR, neuronal RNAi. For 
each genotype, n = 3 larvae, 40 boutons. s, t, Representative evoked EPSP 

(s) and mEPSP (t) traces from control and ten-a mutant genotypes. 

u, Quantification of mean EPSP amplitude (black), mEPSP amplitude (red), 
mEPSP frequency (blue) and quantal content (QC, white) expressed as a 
percentage of the control average. For all genotypes, n = 7 larvae, 8 muscles. 
Error bars represent s.e.m. Scale bars, 10 [1m (a-f), 5 um (h-j), 100 nm (I- 

q). ***P < 0.001, **P < 0.01, *P <0.05, NS, not significant. Statistical 
comparisons are with control unless noted. 


are marked by electron-dense membranes and single presynaptic 
specializations called T-bars (Fig. 21), which enable synapse assembly, 
vesicle release and Ca”’-channel clustering’. Teneurin disruption 
caused defects (Fig. 2m-r and Supplementary Fig. 3) in T-bar ultrastruc- 
ture (Fig. 2m-o), membrane organization, and apposition to contractile 
tissue (Fig. 2p, q). Teneurin perturbation also impaired postsynaptic 
densities while increasing membrane ruffling (Supplementary Table 1), 
further indicating organizational deficiency. These phenotypes 
resemble mutants with adhesion and T-bar biogenesis defects’””*, 
suggesting a role for Teneurins in synaptic adhesion and stability. 
Synaptic vesicle populations similarly required Teneurins for cluster- 
ing at the bouton perimeter and proper density (Supplementary Fig. 4). 
As these effects are not synonymous with active zone disruption’, 
Teneurins are also required for synaptic vesicle organization. 
Synapses lacking teneurin were also functionally impaired. The mean 
amplitude of evoked excitatory postsynaptic potentials (EPSPs) in larvae 
was decreased by 28% in the ten-a mutant (Fig. 2s, u). Spontaneous 
miniature EPSPs showed a 20% decrease in amplitude, a 46% decrease in 
frequency (Fig. 2t, u), and an altered amplitude distribution compared 
with control (Supplementary Fig. 5a). These defects resulted in a 20% 
reduction in quantal content (Fig. 2u), which could be partly due to 
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fewer boutons and release sites. However, release probability may also be 
reduced, as suggested by an increased paired pulse ratio in ten-a mutants 
(Supplementary Fig. 5d, e). The decay kinetics of responses were faster 
in ten-a mutants, suggesting additional postsynaptic effects on glutam- 
ate receptors and/or intrinsic membrane properties (Supplementary 
Fig. 5b, c). Further, FM1-43 dye loading revealed markedly defective 
vesicle cycling in ten-a mutants (Supplementary Fig. 5f, h). Consistent 
with physiological impairment, teneurin-perturbed larvae exhibited 
profound locomotor defects (Supplementary Fig. 5i). In summary, 
Teneurins are required for multiple aspects of NMJ organization and 
function. 

As a potential mechanism for synaptic disorganization following 
teneurin perturbation, we examined the pre- and postsynaptic 
cytoskeletons. In the presynaptic terminal, organized microtubules 
contain Futsch (a microtubule-binding protein)-positive ‘loops’, 
whereas disorganized microtubules possess punctate, ‘unbundled’ 
Futsch”’. Each classification normally represented ~ 10% (often distal) 
of boutons (Fig. 3a, d and Supplementary Fig. 6). Upon teneurin per- 
turbation, many more boutons had unbundled Futsch (Fig. 3b, c and 
Supplementary Fig. 6) whereas those with looped microtubules were 
decreased by 62-95% (Fig. 3d). Therefore, proper microtubule organ- 
ization requires pre- and postsynaptic Teneurins. In contrast to mild 
active zone/glutamate receptor apposition defects, most boutons dis- 
played microtubule organizational defects. 

teneurin perturbation also severely disrupted the postsynaptic spectrin 
cytoskeleton, with which Ten-m colocalized (Fig. le). Postsynaptic 
a-Spectrin normally surrounds the bouton (Fig. 3e). Perturbing neuronal 
or muscle Teneurins markedly reduced postsynaptic o-Spectrin without 
affecting Dlg (Fig. 3f-h and Supplementary Fig. 7). Postsynaptic 
B-Spectrin”’, Adducin” and Wsp were similarly affected (Supplemen- 
tary Fig. 8). In muscle, «-Spectrin is coincident with and essential for 
the integrity of the membranous subsynaptic reticulum (SSR)**”. 
Consistent with this, teneurin disruption reduced SSR width up to 
70% (Supplementary Fig. 9d—g) and increased the frequency of ‘ghost’ 
boutons, which are failures of postsynaptic membrane organization” 
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Figure 3 | Teneurin perturbation results in marked cytoskeletal 
disorganization. a—c, Representative NMJs stained with antibodies to Futsch 
(green) and HRP (magenta). Arrowheads indicate looped organization. Arrows 
indicate unbundled Futsch. d, Quantification of the percentage of total boutons 
with looped or unbundled microtubules. e-g, Representative NMJs stained 
with antibodies to «-Spectrin (green), Dlg (red) and HRP (blue). Following 
teneurin perturbation, o-Spectrin staining is largely lost. Axonal o-Spectrin is 
unaffected by muscle teneurin RNAi (f). h, Quantification of «-Spectrin (green) 
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(Supplementary Fig. 9a—d). Thus, Teneurins are involved in the organ- 
ization of the pre- and postsynaptic cytoskeletons and postsynaptic 
membranes. Further, endogenous «-Spectrin co-immunoprecipitated 
with muscle-expressed, Flag-tagged Ten-m (Fig. 3i), suggesting that 
Ten-m physically links the synaptic membrane to the cytoskeleton. 

Because the most severe defects following teneurin perturbation 
were cytoskeletal, we propose that Teneurins primarily organize the 
presynaptic microtubule and postsynaptic spectrin-based cytoskeletons 
(Fig. 3j), which then organize additional synaptic aspects”’*'. However, 
such a solitary role cannot fully explain the observed phenotypes. The 
reduction in bouton number associated with cytoskeletal disruption is 
milder than that following teneurin disruption”*'**. Also, although 
active zone dynamics are affected by cytoskeletal perturbation”', defects 
in apposition are not*’”*. Moreover, the T-bar structural defects more 
closely resemble synapse adhesion and active zone formation 
defects'”"*. Thus, Teneurins may regulate release site organization 
and synaptic adhesion independent of the cytoskeleton (Fig. 3)). 

Our data also indicate that Teneurins act bi-directionally across the 
synaptic cleft. Ten-a acts predominantly in neurons, as evidenced by 
localization, phenotypes caused by neuronal (but not muscle) knock- 
down, and mutant rescue by neuronal (but not muscle) expression 
(Figs 2 and 3 and Supplementary Figs 2-4, 6, 7 and 9). Yet, in addition 
to the presynaptic phenotypes, many others were postsynaptic, includ- 
ing reduced muscle spectrin, SSR, and membrane apposition (Fig. 3 
and Supplementary Figs 7-9). Similarly, although Ten-m is present 
both pre- and postsynaptically, muscle knockdown resulted in pre- 
synaptic defects, including microtubule and vesicle disorganization, 
reduced active zone apposition, and T-bar defects (Figs 2 and 3 and 
Supplementary Figs 3, 4, 6 and 7). Thus, Teneurins function in 
bi-directional trans-synaptic signalling to organize neuromuscular 
synapses. This may involve downstream pathways or simply establish 
an organizational framework by the receptors themselves. Moreover, 
as the results of single disruptions of neuronal ten-a or muscle ten-m 
were similarly severe and not enhanced by combination (Figs 2g and 
3d, h and Supplementary Fig. 2k), both Ten-a and Ten-m probably 
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and Dlg (red) fluorescence. A.U., arbitrary units. For all genotypes, n = 6 larvae, 
12 NMJs. i, Immunoblots (IB) showing that «-Spectrin is detected in the Flag 
immunoprecipitates (IP) of larvae expressing muscle Flag-HA-tagged Ten-m 
but not in control larvae. Owing to low expression, Flag~HA-Ten-m is only 
detectable after enrichment by immunoprecipitation. j, Model showing the 
roles of Teneurins, Neurexin and Neuroligin at the NMJ. Arrow size represents 
the relative contribution of each pathway to the cellular process as inferred from 
mutant phenotypic severity. Scale bars, 5 jim. ***P < 0.001, NS, not significant. 
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function in the same pathway. Our finding that Ten-a and Ten-m 
co-immunoprecipitate from different cells in vitro’ and across the 
NMJ in vivo (Fig. 1f) further suggests a signal via a trans-synaptic 
complex. Teneurin function, however, may not be solely trans- 
synaptic. In some cases (vesicle density, SSR width), cell-autonomous 
knockdown resulted in stronger phenotypes than knocking down in 
synaptic partners (Supplementary Figs 3, 4, 9 and Supplementary 
Table 1). This suggests additional cell-autonomous roles in addition 
to trans-synaptic Teneurin signalling. 

Signalling involving the transmembrane proteins Neurexin and 
Neuroligin also mediates synapse development”. In Drosophila, Neurexin 
(nrx) and Neuroligin 1 (nlg1) mutations cause phenotypes similar to 
teneurin perturbation: reductions in bouton number, active zone organ- 
ization, transmission, and SSR width”””. nlg1 and nrx mutations do not 
enhance each other, suggesting that they function in the same path- 
way”*. Consistently’”*, we found that nrx and nlg] mutants exhibited 
largely similar phenotypes (data not shown). To investigate the rela- 
tionship between the teneurin genes and nrx and nlg1, we focused on 
the nig1 null mutant. Both Nlg1 tagged with enhanced green fluor- 
escent protein (Nlgl-eGFP) and endogenous Ten-m occupied a 
similar postsynaptic space (Supplementary Fig. 10a). teneurin and 
nigl loss-of-function mutations also displayed similar bouton number 
reductions (Fig. 2e, g), vesicle disorganization (Supplementary Fig. 4), 
and ghost bouton frequencies (Supplementary Fig. 9). Other pheno- 
types showed notable differences in severity. In nig] mutants, there was 
a 29% failure of active zone/glutamate receptor apposition (Fig. 2k and 
Supplementary Fig. 10d), compared to 15% for the strongest teneurin 
perturbation. The cytoskeleton of nlg1 mutants, however, was only 
mildly impaired compared to that seen with teneurin perturbations 
(Fig. 3d, h and Supplementary Figs 6 and 7). 

To examine further the interplay of feneurin and nlgl, we analysed 
ten-a nlg1 double mutants. Both single mutants were viable, despite 
their synaptic defects. Double mutants, however, were larval lethal. We 
obtained rare escapers, which showed a 72% reduction in boutons, 
compared to a 50-55% decrease in single mutants (Fig. 2e). Active 
zone apposition in double mutants was enhanced synergistically over 
either single mutant (Fig. 2k and Supplementary Fig. 10e). Cytoskeletal 
defects in the double mutant resembled the ten-a mutant (Fig. 3 and 
Supplementary Figs 6 and 7). These data suggest that teneurin genes 
and nrx and nig] act in partially overlapping pathways, cooperating to 
organize synapses properly, with Teneurins contributing more to 
cytoskeletal organization and Neurexin and Neuroligin to active zone 
apposition (Fig. 3)). 

In the accompanying manuscript®, we showed that although the basal 
Teneurins are broadly expressed in the Drosophila antennal lobe, ele- 
vated expression in select glomeruli mediates olfactory neuron partner 
matching. At the NMJ, this basal level mediates synapse organization. 
Analogous to the antennal lobe, we found elevated ten-m expression at 
muscles 3 and 8 using the ten-m-GAL4 enhancer trap (Fig. 4a). We 
confirmed this for endogenous ten-m, and determined that it was con- 
tributed by elevated Ten-m expression in both nerves and muscles 
(Fig. 4b-g). Indeed, ten-m-GAL4 was highly expressed in select motor 
neurons, including MN3-Ib, which innervates muscle 3 (ref. 29; 
Supplementary Fig. 11c). This elevated larval expression also varied 
along the anterior—posterior axis (Supplementary Fig. 12), and was 
specific for Ten-m, as Ten-a expression did not differ within or between 
segments (data not shown). 

To test whether elevated Ten-m expression in muscle 3 and MN3-Ib 
affects neuromuscular connectivity, we expressed ten-m RNAi using 
ten-m-GAL4, Wild-type muscle 3 was almost always innervated 
(Fig. 4h). However, after ten-m knockdown, muscle 3 innervation 
failed in 11% of hemisegments (Fig. 4i, j). This required Ten-m on 
both sides of the synapse, as the targeting phenotype persisted follow- 
ing neuronal or muscle RNAi suppression using tissue-specific GAL80 
transgenes (Fig. 4j). ten-a RNAi did not show this phenotype (Fig. 4j), 
consistent with homophilic target selection via Ten-m. The phenotype 
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Figure 4 | High-level Ten-m expression regulates muscle target selection. 
a, Representative images of hemisegment A3 stained with antibodies to Dlg 
(blue), phalloidin (red), and expressing GFP via ten-m-GAL4 (green). High-level 
expression is observed in muscles 3 and 8 and basally in all muscles. b, c, Muscle 3 
(b) and 4 (c) NMJs show differential Ten-m (red) but similar Synaptotagmin 1 
(Syt1; green) expression (from a ten-m muscle knockdown animal). 

d, Quantification of presynaptic Ten-m (red) and Syt1 (green) fluorescence at 
muscle 3 and 4 NMJs. MN, motor neuron. e, f, NMJs at muscles 3 (e) and 4 
(f) show differential Ten-m (red) but similar Syt1 (green) expression in muscles 
(from a ten-m nerve knockdown). g, Quantification of postsynaptic Ten-m (red) 
and Syt1 (green) fluorescence at muscle 3 and 4 NMJs. h, i, Representative 
images stained with phalloidin (blue) and antibodies to HRP (green) and Dlg 
(red) to visualize motor neurons and muscles in control (h) or ten-m- 
GAL4>ten-m RNAi larvae (i). m, muscle. j, Quantification of the hemisegment 
percentage with failed muscle 3 (red), 4 (black) or 2 (blue) innervation. IR, 
interfering RNA. k, 1, Representative images of the muscle 6/7 NMJ labelled with 
antibodies to Dg (green) and HRP (magenta). The characteristic wild-type 
arrangement of boutons (k) is shifted towards muscle 6 when Ten-m is 
overexpressed in that muscle and the innervating motor neurons 

(1). m, Quantification of the total bouton percentage on muscles 6 (blue) and 7 
(red). All genotypes contain H94-GAL4, additional transgenes are indicated (for 
details, see Methods). The Ten-m-mediated shift is abolished by neuronal or 
muscle GAL80 transgenes. Scale bars, 100 1m (a), 5 um (b-i), 10 pm (k, 1). In all 
cases, n= 12 larvae. ***P < 0.001, **P < 0.01, NS, not significant. 


was specific to muscle 3, as innervation onto the immediately proximal 
or distal muscle was unchanged (Fig. 4j). The low penetrance is probably 
due to redundant target selection mechanisms”. Where innervation 
did occur, the terminal displayed similarly severe phenotypes to other 
NMJs (not shown). Thus, in addition to generally mediating synaptic 
organization, Ten-m also contributes to correct target selection at a 
specific NMJ. 

To determine whether Ten-m overexpression could alter connec- 
tivity, we expressed Ten-m in muscle 6 (but not the adjacent muscle 7), 
and the motor neurons innervating both muscles using H94-GAL4. 
Normally, 60% of the boutons at muscles 6/7 are present on muscle 6 
with 40% on muscle 7 (Fig. 4k, m). Ten-m overexpression caused a 
shift whereby 81% of boutons synapsed onto muscle 6 and only 19% 
onto muscle 7 (Fig. 41, m). This shift also required both neuronal and 
muscle Ten-m, as neuronal or muscle GAL80 abrogated it (Fig. 4m). 
The effect was specific because Ten-a overexpression did not alter this 
synaptic balance (Fig. 4m), nor was it secondary to altered bouton 
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number, which was unchanged (data not shown). Therefore, elevated 
Ten-m on both sides of the NMJ can bias target choice. This, combined 
with evidence that Ten-m can mediate homophilic interaction in vitro’, 
supports a trans-synaptic homophilic attraction model at the NMJ as 
in the olfactory system. 

We identified a two-tier mechanism for Teneurin function in synapse 
development at the Drosophila NMJ. At the basal level, Teneurins are 
expressed at all synapses and engage in hetero- and homophilic 
bi-directional trans-synaptic signalling to organize synapses properly 
(Fig. 3j). Supporting this, Teneurins can mediate homo- and hetero- 
philic interactions in vitro’ and heterophilic interactions in vivo 
(Fig. 1f). At the synapse, Teneurins organize the cytoskeleton, interact 
with «-Spectrin, and enable proper adhesion and release site forma- 
tion. Further, elevated Ten-m expression regulates target selection in 
specific motor neurons and muscles via homophilic matching and 
functions with additional molecules” to mediate precise neuromuscular 
connectivity. Teneurin-mediated target selection at the NMJ is ana- 
logous to its role in olfactory synaptic partner matching®. As Teneurins 
are expressed broadly throughout the antennal lobe, it remains an 
attractive possibility that they also regulate synapse organization in 
the central nervous system. 


METHODS SUMMARY 


Details of Drosophila stocks, immunostaining, electron microscopy, functional 
assays, construction of epitope-tagged Teneurin constructs, immunoprecipitation, 
imaging and statistical analysis can be found in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Drosophila stocks. All Drosophila strains and controls were raised at 29 °C to 
maximize GAL4 expression. All mutants and transgenes were maintained over 
GFP balancer chromosomes to enable larval selection. Mhc-GAL4 or Mef2- 
GALA4 (ref. 31) was used to drive expression in all somatic muscles. Nrv2-GAL4 
(ref. 32) and elav-GAL4 (ref. 33) were used to drive expression in all neurons. H94- 
GAL4 was used to drive expression in muscles 6, 13 and 4 and their corresponding 
motor neurons™. daughterless-GAL4 was used to drive expression ubiquitously”. 
Synj-QF*® was used to drive expression in all nerves. NP6658-GAL4 (ten-m-GAL4) 
was used to drive expression in the pattern of endogenous Ten-m expression®. The 
Df(X)ten-a deletion was used as a ten-a null mutant. For n/g1 mutants, the 1960 
and ex2.3 alleles were used in trans** and double mutant larvae with ten-a and nlg1 
mutations were obtained using optimized rearing conditions’. Because of the 
early lethality of the ten-m mutant'’, and to assess independently ten-a, tissue- 
specific RNAi was used to examine teneurin perturbation using the following 
RNAi transgenic strains: for ten-m, UAS-ten-m®N4-Y51173 and for ten-a (ref. 8), 
UAS-ten-a®“* V4? The following transgenic strains were used: UAS-Dcr2 
(ref. 38), UAS-Fas2 (ref. 34), UAS-mCD8GFP (ref. 39), UAS-Nigl-eGEP 
(ref. 28), UAS-Ten-a (ref. 8), P{GS}9267 for ten-m overexpression*. In all cases, 
the efficacy of RNAi transgenes, overexpression transgenes and the ten-a deletion 
mutant were assessed and verified by the alteration of antibody staining (loss, 
reduction or increase) using tissue-specific GAL4 drivers. For all cases, N-IR 
indicates neuronal RNAi, M-IR indicates muscle RNAi, U-IR indicates ubiquitous 
RNAi. For rescuing ten-a mutants, Ten-a N indicates neuronal overexpression 
with elav-GAL4 of UAS-ten-a and Ten-a M indicates muscle overexpression of 
UAS-ten-a with Mef2-GAL4. 

Immunostaining. Wandering third instar larvae were processed as previously 
described”. The following primary antibodies were used: mouse antibody to 
Ten-m (mAb20, 1:500)*°, guinea pig antibody to Ten-a (1:100)*’, mouse antibody 
to Brp (mAbnc82, 1:250)”, rabbit antibody to Synaptotagmin 1 (1:4,000)*, mouse 
antibody to Cysteine String Protein (mAb6D6, 1:100)**, mouse antibody to Dlg 
(mAb4F3, 1:500)*, rabbit antibody to Dlg (1:40,000)**, mouse antibody to 
a-Spectrin (mAb3A9, 1:50)"”, mouse antibody to Fasciclin 2 (mAb1D4, 1:20)”, 
rabbit antibody to Fasciclin 2 (1:5,000)*°, mouse antibody to Futsch (mAb22C10, 
1:50), rabbit antibody to DGluRIII (1:2,500)'*, rat antibody to Elav 
(mAb7E8A10, 1:25), mouse antibody to Even-skipped (mAb3C10, 1:100)°°, 
rabbit antibody to B-Spectrin (1:1,000)*', mouse antibody to Hts (1:500)”, guinea 
pig antibody to Wsp (1:1,000)**. Alexa488-, Alexa546- or Alexa647-conjugated 
secondary antibodies were used at 1:250 (Invitrogen). Texas-Red-conjugated 
Phalloidin was used at 1:300. FITC-, Cy3- or Cy5-conjugated antibodies to HRP 
were used at 1:100 (Jackson ImmunoResearch). 

Electron microscopy. Wandering third instar larvae were processed and sec- 
tioned as described”*. Sections were imaged on a JEM-1400 (JEOL) transmission 
electron microscope at 3,000 to *20,000 magnification. 

Electrophysiology. Larvae were dissected in HL3 saline® containing 0 mM Ca** 
and 4mM Mg”". They were then transferred to saline containing 0.6 mM Ca** 
and recordings conducted by impaling larval muscle 6 in body wall segments A3 
and A4 using sharp intracellular electrodes (10-20MQ), fabricated from 
borosilicate glass capillaries and filled with 3 M KCl solution. For evoked EPSPs, 
severed nerve bundles were stimulated using a suction electrode connected to a 
linear stimulus isolated (A395, World Precision Instruments). Data, acquired 
using Multiclamp 700B amplifiers (Molecular Devices), were low-pass filtered at 
3 kHz and digitized at 10 kHz. Recordings were acquired and analysed using Igor 
Pro software (Wavemetrics) and custom-written programs. All recordings in 
which the resting membrane potential was higher than —60 mV and/or whose 
resting potential, input resistance or access resistance changed by more than 20% 
during the duration of data acquisition were excluded from analysis. All recordings 
and data analyses were performed blind to the genotype. Quantal content was 
corrected for nonlinear summation”. 

Larval locomotion. Crawling assays were conducted as described”. 

FM1-43 dye loading. FM1-43 (Invitrogen) dye loading was conducted as 
described*® with the following modifications: loading was conducted in 1.5mM 
Ca?*, 90 mM K% saline for 1 min followed by six 2-min washes in 0 mM Ca?* 
saline. Imaging was conducted on a Zeiss LSM 510 Meta Confocal (Carl Zeiss) 
with a X40, PlanApo NA 1.0 water immersion lens (Carl Zeiss). 

Construction of epitope-tagged teneurin transgenes. The ten-m and ten-a 
coding sequences lacking the stop codon were cloned into pENTR-D/TOPO 
(Invitrogen) from pENTR-ten-m and pENTR-ten-a*. pENTR-ten-m(-stop) and 
pENTR-ten-a(-stop) were recombined into the destination vectors pUASTattB- 
gtw-tFHAH and pQUASTattB-gtw-tFMH, respectively, using LR Clonase II 
(Invitrogen). pUASTattB-gtw-tFHAH is a pUASt-Gateway-attB based vector with 
a C-terminal TEV recognition site and 3XFlag, 3XHA and 10XHis tags. 
PQUASTattB-gtw-tFMH is a pQUASt-Gateway-attB’ based vector with a 


C-terminal TEV recognition site and 3xFlag, 6<Myc and 10XHis tags. The 
resulting constructs were verified by restriction digest and sequencing and inte- 
grated into the attP24 or 86Fb landing sites on the second and third chromo- 
somes”. Transgenic flies were verified by immunoprecipitation on western blot 
and overexpression experiments. 

Immunoprecipitation, western blots and SDS-PAGE analysis. For Ten-m and 
Ten-a, QUAS-Ten-a-Flag-Myc was expressed in nerves using Synj-QF and UAS- 
Ten-m-Flag-HA in muscles using mhc-GAL4. Larval synaptosomes were prepared 
from larval body wall fillets as described**. For Ten-m and «-Spectrin, control 
larvae consisted of Mef2-GAL4 without UAS-Ten-m-Flag-HA whereas experi- 
mental flies combined the two. Immunoprecipitation was conducted as described 
using M2-anti-Flag-conjugated agarose (Sigma) or Affi-Prep Protein A beads 
(Bio-Rad) and rat antibodies to HA (Roche)”. Proteins were separated on 
NuPAGE 3-8% Tris-Acetate Gels (Invitrogen) and transferred to nitrocellulose. 
Primary antibodies were applied overnight at 4°C and secondary antibodies at 
21°C for 1h. The following primary antibodies were used: mouse antibody to 
ot-Spectrin (mAb3A9, 1:2,000), mouse antibody to Brp (mAbnc82, 1:100), mouse 
antibody to Flag (M2, 1:5,000, Sigma-Aldrich), mouse antibody to Myc (3E10, 
1:1,500, Santa Cruz Biotechnology), rat antibody to HA (3F10, 1:1,500, Roche). 
HRP-conjugated secondary antibodies (Jackson ImmunoResearch) were used at 
1:10,000. Blots were developed using the SuperSignal West Femto Maximum 
Sensitivity Substrate (ThermoScientific). 

Imaging analysis. Larvae were imaged with a Zeiss LSM 510 Meta laser-scanning 
confocal microscope (Carl Zeiss) using either a X63 1.4 NA or a X40 1.0 NA 
objective. NMJ images were taken as confocal z-stacks with the upper and lower 
bounds defined by HRP staining unless otherwise noted. For all metrics, boutons 
were assessed in segment A3 at muscle 6/7 and muscle 4 on both the left and right 
sides. Fluorescence intensity measurements were taken from terminals on muscle 
4. All phenotypes, however, were observed at all synapses regardless of muscle 
fibre or segment. For membrane organization, vesicle distribution and Teneurin 
colocalization, NMJ images were taken as single optical sections at the precise 
centre of the bouton as determined by HRP staining. Images were processed with 
the LSM software and Adobe Photoshop CS4. Bouton number, active zone/ 
glutamate receptor apposition, fluorescent intensity and microtubule organization 
were quantified as previously described”’. Targeting errors for each larva were quan- 
tified as the percentage of hemisegments from Al to A7 in a single animal with a 
failure of target innervation. There was no difference in targeting errors based on 
body wall segment. Experiments using H94-GAL4 were conducted as described™, 
and their effects confirmed using Fasciclin 2 overexpression (control = 58.1% of 
boutons on muscle 6, 41.9% on muscle 7; Fas 2 overexpression = 73.0% on muscle 
6, 27.0% on muscle 7; n = 8 animals for each, P< 0.0001)**. 

In electron micrographs, parameters were quantified as previously described 
using ImageJ (NIH)**. T-bar defects were classified into one of five categories: 
normal (no discernible defect), double (two T-bars were observed in the same, 
continuous active zone), detached (where the T-bar was clearly visible but was not 
explicitly connected to the membrane associated with the nearest PSD), apposite 
contractile tissue (where the T-bar was not apposed to the SSR, but rather, the 
contractile tissue of the muscle), misshapen (where an electron-dense T-bar was 
visible but did not conform to the ‘T’ shape. Often, the T-bars were ‘X’ shaped). For 
Fig. 2r, each defect is expressed as a percentage of the total number of T-bars 
observed in a particular genotype. 

For Ten-m gradient calculation, single optical sections were taken through the 
centre of the NMJ on muscle 3 or muscle 4, as determined by HRP immunoreacti- 
vity. The GFP signal (ten-m-GAL4) or antibody signal was then measured using 
Image] (NIH). For each larva, measurements were taken on the right and left sides 
of each indicated segment. The fluorescence for each segment was expressed as a 
percentage of the fluorescence from segment A1 in the same animal, on the same 
side of the larvae. For all larvae, segment A1 represented the maximal fluorescence. 
Statistical analysis. Statistical analysis used GraphPad Prism 5 (Graphpad 
Software). In all cases involving more than two samples, significance was calcu- 
lated using ANOVA followed by a Dunnett post-hoc test to the control sample and 
a Bonferroni post-hoc test among all samples. For two-sample cases, an unpaired 
Student’s t-test was used to assess significance, unless otherwise indicated. In all 
cases, both methods provided similar significance measurements. In all figures, 
significance is with respect to control genotypes unless otherwise noted. 
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Transcription factor PIF4 controls the 
thermosensory activation of flowering 


S. Vinod Kumar'*, Doris Lucyshyn't*, Katja E. Jaeger’, Enriqueta Alds', Elizabeth Alvey’, Nicholas P. Harberd"? & Philip A. Wigge't 


Plant growth and development are strongly affected by small dif- 
ferences in temperature’. Current climate change has already 
altered global plant phenology and distribution’, and projected 
increases in temperature pose a significant challenge to agricul- 
ture*. Despite the important role of temperature on plant develop- 
ment, the underlying pathways are unknown. It has previously 
been shown that thermal acceleration of flowering is dependent 
on the florigen, FLOWERING LOCUS T (FT)**. How this occurs 
is, however, not understood, because the major pathway known to 
upregulate FT, the photoperiod pathway, is not required for thermal 
acceleration of flowering®. Here we demonstrate a direct mechanism 
by which increasing temperature causes the bHLH transcription 
factor PHYTOCHROME INTERACTING FACTOR4 (PIF4) to 
activate FT. Our findings provide a new understanding of how plants 
control their timing of reproduction in response to temperature. 
Flowering time is an important trait in crops as well as affecting 
the life cycles of pollinator species. A molecular understanding of 
how temperature affects flowering will be important for mitigating 
the effects of climate change. 

Arabidopsis thaliana, like many higher plants, responds to warmer 
ambient temperatures by increasing its growth rate and accelerating 
the floral transition’*”. Arabidopsis is a facultative long-day plant, and 
plants grown under short photoperiods are dramatically delayed in 
flowering. Interestingly, late flowering in short days can be overcome 
by growth at higher temperatures®. The underlying mechanism is, 
however, unknown. The flowering response to temperature is dependent 
on the floral pathway integrator gene FT’, indicative of a thermosensory 
pathway that upregulates FT expression independently of daylength. 
Because the bHLH transcription factor PIF4 has been shown to regulate 
architectural responses to high temperature®’, we tested if PIF4 is 
required for the induction of flowering at high temperature in short 
photoperiods. Although pif4-101 was slightly delayed in flowering at 
22 °C, pif4-101 mutants showed a striking loss of thermal induction of 
flowering at 27°C (Fig. la, b). To test if pif4-101 perturbed floral 
induction by affecting FT expression, we examined the thermal induc- 
tion of FT in Col-0 and pif4-101. Although FT expression was strongly 
thermally inducible in Col-0, this response was largely abolished in 
pif4-101 at 27°C (Fig. 1c), indicating that PIF4 is necessary for the 
thermal acceleration of flowering in short days. By contrast, PIF4 is 
not required for the thermosensory induction of flowering under 
continuous light®, suggesting that the photoperiod pathway also 
interacts with the ambient temperature sensing pathway. The reduced 
role of PIF4 under continuous light probably reflects the instability of 
PIF4 in light’® coupled with the fact that the output of the photoperiod 
pathway, CONSTANS (CO) protein, is stabilized by light", shifting 
the balance of floral induction from PIF4 to the photoperiod 
pathway. Because PIF4 is necessary for the thermal induction of 
flowering in short days, we tested if it is sufficient to trigger flowering 
when overexpressed. 35S::PIF4 caused extremely early flowering 


(Fig. 1d, e), similar to the effect of overexpressing a related gene, 
PHYTOCHROME INTERACTING FACTORS (ref. 12), suggesting 
that PIF4 is limiting for the acceleration of flowering at lower temper- 
ature in short photoperiods. Consistently, 35S::PIF4 plants showed 
elevated levels of FT (Fig. 1f). Furthermore, 35S::PIF4 ft-3 showed a 
complete suppression of the early flowering phenotype, indicating that 
the induction of flowering by 35S::PIF4 was dependent on FT (Fig. 1g, h). 
This activation of FT appears to be independent of the established 
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Figure 1 | PIF4 is necessary for the thermal induction of flowering in short 
photoperiods. a, pif4-101 plants do not show acceleration of flowering at 27 °C 
compared with Col-0. Inset, a 16-week-old pif4-101 plant grown at 27 °C. 

b, Rosette leaf numbers at flowering for Col-0 and pif4-101 grown at 22 and 
27 °C in short-photoperiod conditions (error bars, SD; n = 6). c, FT expression 
as measured by quantitative polymerase chain reaction (qPCR) PCR in 4-week- 
old plants at 22 and 27 °C under short photoperiods in a PIF4-dependent 
manner (data from three biological replicates; error bars, SD). d, 35S::PIF4 
overexpression triggers very early flowering. e, Leaf numbers at flowering for 
Col-0 and 35S::PIF4 in long photoperiods (error bars, SD; n = 5). f, CO and FT 
gene expression data measured by qPCR in Col-0 and 35S::PIF4 at 21 °C in long 
photoperiods (samples taken 2 weeks after sowing; data from three biological 
replicates; all error bars, SD). g, FT is required for the early flowering phenotype 
of 35S::PIF4 plants. When crossed into the ft-3 background, the early flowering 
of 35S::PIF4 is completely suppressed. Inset, top view of the 35S::PIF4 ft-3 plant 
showing that petiole elongation growth is retained. h, Flowering time data for 
Col-0, Ler, 35S::PIF4, ft-3 and 35S::PIF4 ft-3 plants (error bars, SD; n = 5). 
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photoperiod pathway because CO did not change in response to 
35S::PIF4 (Fig. 1f). Finally, although co-9 mutants are late flowering*”’, 
we found 35S::PIF4 co-9 plants were early flowering, indicating that 
PIF4 acts largely independently of CO (Supplementary Information 
and Supplementary Fig. 1), consistent with the thermal induction of 
flowering being independent of the photoperiod pathway (Fig. 1f)°. 
Although PIF4 has been shown to be important for high-temperature 
responses, long-term increases in either PIF4 transcript or PIF4 protein 
levels in response to higher ambient temperature that can account for 
the observed growth responses have not been detected*”. To examine if 
variation of PIF4 transcription under our experimental conditions 
might account for the increases in PIF4 activity with temperature, 
we measured PIF4 transcript levels at 12, 17, 22 and 27 °C in seedlings 
(Fig. 2a). PIF4 transcript levels increased from 12 °C to 22 °C, whereas 
the difference between 22 °C and 27 °C was not statistically significant. 
Plants at 27°C, compared with 22°C, showed a very large PIF4- 
dependent response, suggesting that variation in the PIF4 transcript 
is not sufficient to account for the acceleration of flowering at 27 °C 
compared with 22 °C. To test whether temperature-mediated changes 
in PIF4 transcription are rate limiting for the biological response, we 
analysed the behaviour of plants constitutively expressing PIF4. 
Although 35S::PIF4 plants at 22 °C were extremely early flowering, this 
phenotype could be largely suppressed at 12°C (Fig. 2b and Sup- 
plementary Fig. 2), indicating that even when PIF4 transcript is abund- 
ant, lower temperatures are inhibitory for PIF4 activity. A possible 
explanation for this difference is that PIF4 protein is destabilized by 
low temperature. Indeed, PIF4 protein levels have already been shown 
to be strongly regulated by light’®, and growth in red and blue photo- 
cycles destabilizes PIF4 protein at low temperatures. The PIF4 over- 
expression lines contain a fusion to the haemagglutinin (HA) epitope 
(35S::PIF4:HA). We therefore examined the levels of PIF4:HA, protein 
at 12, 17, 22 and 27 °C under the same light conditions used for our 
flowering time assays. Consistent with previous studies’? we saw a 
strong accumulation of PIF4 at the end of the night period, which 
was subsequently degraded during the day. Despite the suppression 
of early flowering in 35S::PIF4 at 12 °C compared with 22 °C (Fig. 2b), 
we did not observe an appreciable difference in PIF4 protein levels at 
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Figure 2 | Regulation of PIF4 by temperature. a, Transcriptional regulation 
of PIF4 by temperature. Ten-day-old Col-0 seedlings grown at 12, 17, 22 and 
27 °C under short photoperiods were analysed for PIF4 expression by qPCR. 
Data shown are from three biological replicates. Error bars, s.d. b, Flowering 
phenotype of 35S::PIF4 is temperature dependent. Although 35S::PIF4 plants 
(right) flower very early at 22 °C (upper panel) compared with Col-0 (left), this 
phenotype is largely suppressed by growth at 12 °C (lower panel). c, PIF4 
protein levels in 35S::PIF4 plants are not affected by growth temperature. 
Seven-day-old 35S::PIF4:HA seedlings grown at 17 °C were transferred to 12, 
17, 22 and 27 °C under short photoperiods for 2 days and samples were 
collected at the end of night before light and after 4h under illumination. 
Although PIF4:HA protein levels are independent of growth temperatures, the 
protein is robustly degraded in presence of light. PIF4:HA protein was detected 
by horseradish peroxidase (HRP)-conjugated HA antibody. Stained lower half 
of the gel used for immunoblot is shown as loading control. 


LETTER 


these two temperatures that was likely to account for these different 
phenotypes (Fig. 2c and Supplementary Fig. 3). Slightly higher levels 
of PIF4:HA appeared to be present at 27°C (Fig. 2c), suggesting 
high-temperature stabilization of PIF4 may also contribute to higher 
PIF4 activity at 27 °C. 

Taken together, these data indicate that PIF4 regulates FT in a 
temperature-dependent manner. To determine if this is probably the 
case in planta, we analysed the spatial expression of FT and PIF4. FT 
has a distinctive pattern of expression in the vasculature of the leaf"*"®, 
and significantly PIF4 was expressed in the same domain (Fig. 3a). 
Because the regulation of FT by PIF4 could be either direct or indirect, 
we used chromatin immunopurification (ChIP) to analyse if PIF4 
binds directly to the FT promoter proximal to the transcriptional start 
site. This region of the promoter was chosen because it has been shown 
to be both phylogenetically conserved and the site for light-mediated 
regulation of FT expression’®’’. We observed robust enrichment of 
PIF4 near to the transcriptional start site (Fig. 3b), indicating that PIF4 
binds this region in vivo to activate FT expression. 
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Figure 3 | PIF4 directly binds the FT promoter in a temperature-dependent 
manner. a, B-Glucuronidase (GUS) histochemical analysis of the expression 
domains of FT and PIF4 in rosette leaves. b, ChIP analysis shows PIF4 binding 
to the FT locus in vivo in seedlings. The At5g45280 promoter is a positive control 
for PIF4-binding activity’', the HSP70 promoter was used as a negative control. 
c, Summary of the structure of the FT promoter and positioning of qPCR 
amplicons for ChIP analysis. d, ChIP analysis of 35S::PIF4:HA at 12 and 27 °C 
(2-week-old seedlings, short photoperiods). e, ChIP analysis of PIF4::PIF4:ProA 
at 17, 22 and 27 °C (4-week-old soil-grown plants, short photoperiods). 

f, Analysis of H2A.Z occupancy at the FT locus at 17 and 27 °C (3-week-old 
plate-grown plants, short photoperiods). g, ChIP analysis of PIF4 binding to FT 
in Col-0 and arp6-1 (3-week-old soil-grown plants, 22 °C short photoperiod). 
(For all ChIP experiments, plant materials were collected at the end of dark 
period before lights came on and were protected from light until frozen. All data 
presented are from two independent ChIP experiments; all error bars, SD.) 
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Given the striking effect of ambient temperature on PIF4 activity, 
which occurs even when PIF4 is constitutively expressed, we hypothesized 
that the ability of PIF4 to bind the FT promoter may be temperature 
dependent. To test this, we performed ChIP experiments using 
35S::PIF4 plants grown at 12 and 27°C with primers flanking an 
E-box in the FT promoter (Fig. 3c). Strikingly, we observed a very strong 
temperature dependence for this binding, with an approximately 
fivefold increase in binding at 27 °C compared with 12°C (Fig. 3d). 
This indicates that the later flowering of 35S::PIF4 at 12 °C is caused bya 
decrease in PIF4 binding to FT. Because the 35S promoter causes strong 
ectopic expression of PIF4, we sought to confirm that PIF4 protein 
expressed at endogenous levels displays similar temperature-dependent 
binding to the FT promoter. We therefore performed ChIP experiments 
on a pif4-101 line complemented with PIF4,,,::PIF4:ProteinA (Sup- 
plementary Fig. 4). Consistent with the overexpression studies, we 
observed a strong increase in PIF4 binding to FT as a function of 
temperature. Reduced binding was observed at 17 °C, consistent with 
the very late flowering of plants under short days at low temperature, 
but this binding increased at 22°C and was even higher at 27°C 
(Fig. 3e). The temperature-dependent binding of PIF4 to FT could be 
due to growth temperature influencing the affinity of the PIF4 tran- 
scription factor for its binding site, or the efficiency of the ChIP could be 
affected by the temperature at which tissues were grown. To test these 
possibilities, we analysed another recently described PIF4 target 
locus'*, CYP79B2 (At4g39950), which is upregulated in 35S::PIF4 (Sup- 
plementary Fig. 5a). We found PIF4 binding to occur constitutively at 
both 12 and 27 °C ata region in the first exon (Supplementary Fig. 5b). 
Another region further upstream in the promoter showed a temper- 
ature-dependent binding of PIF4, and, in both cases, no enrichment was 
seen for a control locus (Supplementary Fig. 5b). This indicates that the 
abundant PIF4 protein we observed at 12 °C is active and able to bind 
target sites, and confirms that the ChIP method per se is not influenced 
by the temperature at which the sample is grown, consistent with other 
studies’. The ability of PIF4 to bind loci in a more temperature- 
independent manner might explain why 35S::PIF4 at 12 °C maintains 
hypocotyl and petiole elongation, while early flowering is strongly 
suppressed. We do not exclude that temperature may also influence 
PIF4 activity post-translationally. 

Temperature signals are mediated through H2A.Z-nucleosomes in 
Arabidopsis”, suggesting that temperature may be increasing the 
accessibility of the PIF4-binding site at the FT promoter. Consistent 
with this hypothesis, we found that H2A.Z-nucleosomes were present 
at the PIF4-binding site in the FT promoter. Furthermore, we found 
that the levels of H2A.Z-nucleosomes at the FT promoter decreased 
with higher temperature (Fig. 3f). These results suggest that the pres- 
ence of H2A.Z-nucleosomes is limiting for PIF4 binding to FT, and 
that the PIF4 binding we observed at higher temperature is due to the 
greater accessibility of chromatin containing H2A.Z-nucleosomes at 
higher temperature. This suggests that in the absence of H2A.Z- 
nucleosomes, PIF4 should bind FT more strongly. We therefore 
compared the ability of PIF4 expressed under its own promoter to 
bind to the FT promoter in wild type and arp6-1, a background lacking 
incorporation of H2A.Z-nucleosomes. Interestingly, we observed 
considerably greater binding of PIF4 in arp6-1 (Fig. 3g), indicating 
that H2A.Z-nucleosomes are rate limiting for PIF4 to activate FT 
expression. The eviction of H2A.Z-nucleosomes by higher temper- 
ature therefore provides a direct mechanism for the temperature- 
regulated expression of FT (Fig. 4c). Consistent with our previous 
results and the established role of H2A.Z in regulating temperature- 
dependent gene expression, we found that there is increased PIF4 
messenger RNA in arp6-1 background (Supplementary Fig. 6). 
However, our results for 35S::PIF4 suppression by 12 °C indicate that 
transcriptional upregulation of PIF4 is not the rate-limiting step in 
regulating PIF4-mediated flowering at higher temperatures. 

Our results indicate that the temperature-dependent regulation of 
FT by PIF4 is controlled at the level of chromatin accessibility of the 
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Figure 4 | PIF4 integrates environmental signals. a, Suppression of 
flowering at 12 °C is significantly repressed in the absence of DELLA- 
mediated repression. b, Leaf number at flowering for della global is reduced 
compared with Ler at 12 °C (error bars, SD; n = 6). c, Representation of 
temperature-dependent FT regulation by PIF4. Temperature-induced H2A.Z 
nucleosome dynamics can regulate PIF4 binding to target loci for 
transcriptional activation. 


FT promoter and possibly at the level of PIF4 protein activity. PIF4 
activity is controlled through the repressive activity of DELLA proteins 
that prevent PIF4 binding DNA”'”. 

Consistently, plants having reduced or absent DELLA function are 
early flowering’. We hypothesized that delay in flowering at lower 
temperatures might at least in part be due to DELLA-mediated repres- 
sion of PIF4 activity. If so, it would be expected that absence of 
DELLAs should cause accelerated flowering at lower temperatures. 
In accord with this expectation, we found that a mutant lacking 
DELLAs flowered much earlier than wild type when grown at 12 °C 
(Fig. 4a, b). The phytohormone gibberellin triggers DELLA protein 
degradation, and plays a key permissive role for FT induction, because 
in a gibberellin-deficient background, gibberellin application increases 
FT expression 15-fold**. Although it was proposed more than 50 years 
ago that gibberellins are upstream of florigen”’, the mechanism has not 
been clear. As DELLA proteins have been shown to be key regulators 
by which gibberellin influences PIF4, our finding that PIF4 is able to 
activate FT directly suggests a possible mechanism by which changes 
in gibberellin levels may influence flowering. 

Climate change has already caused measurable changes in plant 
phenology and behaviour’, and plants that incorporate temperature 
information into their life cycles appear to be able to adapt to warmer 
conditions more effectively than those that primarily rely on photo- 
period to synchronize their lifestyles*. The importance of the effects of 
climate change on yield are highlighted by the significant detrimental 
effects of increasing temperatures on yield*. PIF4 is a central integrator 
of environmental information in the plant and our finding that it 
activates FT at higher temperatures suggests it will be a key node for 
breeding crops resilient to climate change. This importance is sug- 
gested by the recent discovery that natural variation at PIF4 plays a 


major role in key ecological traits”®. 
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METHODS SUMMARY 


Detailed descriptions of the plant growth conditions, growth assays, transgenic 
constructs and ChIP techniques are provided in the Supplementary materials. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Plant material and growth conditions. All plant lines used were in Col-0 
background unless otherwise specified. pif4-101 mutant was provided by 
C. Fankhauser, HA-tagged 35S::PIF4 by S. Prat'?. All references to ‘35S::PIF4 
and ‘PIF4:HA’ refer to this line: that is, 35S::PIF4:HA. FT::GUS was obtained from 
K. Goto'*. 35S::PIF4:HA co-9 was obtained by crossing. The crosses were 
genotyped for presence of the 35S::PIF4 construct by PCR on genomic DNA using 
primers 2362 and 2363, resulting in two products of different size representing the 
complementary DNA (cDNA) transgene and the genomic DNA fragment, 
respectively. co-9 was genotyped with primers 3650 and 3652 for insertion, and 
3291 and 3292 for the wild-type fragment. For genotyping phyb-9, DNA was 
amplified using oligonucleotides 2137 and 2138 followed by Mnll digestion to 
distinguish between wild-type and mutant alleles. The global della mutant is in the 
Ler background and was described previously*. PIF4::PIF4:ProteinA and 
PIF4::PIF4:GUS were constructed by amplifying the genomic fragment of PIF4 
including the promoter with oligonucleotides 1534 and 1535. The PCR product 
was cloned into pENTR/D-TOPO (Invitrogen) and inserted into the binary plas- 
mids PW889 (carboxy-terminal ProteinA) and PW395 (carboxy-terminal GUS), 
respectively, using Gateway technology (Invitrogen). Transgenic plants were 
obtained by transforming pif4-101 by floral dip. For hypocotyl measurements, 
seeds were surface sterilized, sown on 2 MS media, stratified for 2 days at 4°C 
in the dark and germinated for 24h at 22 °C. The plates were then transferred to 
short-day conditions (8/16 h photoperiod) at 22 and 27 °C respectively, and grown 
vertically for 10 days before being imaged and the hypocotyl length measured 
using the Image] software (http://rsbweb.nih.gov/ij/). Oligonucleotide sequences 
are provided in Supplementary Table 1. 

Transcript analysis. Samples from plants grown in long days (16/8h photo- 
period) were collected and total RNA was extracted using Trizol Reagent 
(Invitrogen). RNA (2 1g) was treated with Dnasel (Roche) and used for cDNA 
synthesis (First strand cDNA synthesis kit, Fermentas). cDNA was diluted 1:8 and 
used for qPCR with a Roche Lightcycler 480 and the corresponding Sybr Green 
master mix. To detect FT transcript levels, oligonucleotides 3180 and 3181 were 
used; for CO, oligonucleotides 2951 and 2952. PIF4 transcript levels were analysed 
using oligonucleotides 3952 and 3953. Oligonucleotides 3247 and 3408 amplifying 
TUB6 (At5g12250) were used for normalization. 

Immunoblot analysis. To analyse the possible effect of temperature on PIF4 
protein stability, plants overexpressing PIF4:HA (35S::PIF4:HA) were used. Seven- 
day-old 35S::PIF4:HA seedlings grown in short days at 17 °C were transferred to 12, 
17, 22 and 27 °C in short days for 2 days. Samples were collected at end of night and 


thereafter 30 min, 1 h and 4h under illumination. Protein samples were separated 
by SDS-polyacrylamide gel electrophoresis and transferred on to nitrocellulose 
membrane. PIF4:HA was detected using HRP-conjugated HA antibody (Miltenyi 
Biotech) and visualized by chemiluminscent detection using Immobilon 
Chemiluminescent HRP substrate (Millipore). 

GUS histochemical assay. For GUS-staining, plants were grown on 2 MS plates 
in long days (16/8 h photoperiods) for 10 days and kept in the dark for 24h before 
collecting. Plants were stained in buffer containing 100 mM phosphate buffer, pH 
7,10mM EDTA, 0.1% Triton-X100, 0.5 mM K-ferrocyanide and 1 mM X-Gluc at 
37 °C for 24h before de-staining in ethanol. 

ChIP. ChIP was performed as described”? with minor modifications. 
35S::PIF4:HA seedlings were grown on '/ MS plates for 10 days and kept in the 
dark for 24h at respective temperatures before collecting. Plant tissue (1.5 g) and 
4g of antibody (HA-tag antibody ab9110 from Abcam) were used for ChIP. To 
analyse the dynamics of PIF4::PIF4:ProteinA, plants were grown in respective 
temperatures under short-day conditions for 4 weeks. Aerial parts of the plants 
were collected and cross-linked before being used for chromatin preparations. 
ChIP was done using magnetic beads (Dynabeads M-270 Epoxy, Invitrogen) 
coated with rabbit IgG (Sigma, 15006) as described (http://www.ncdir.org/ 
protocols/Rout/Conjugation%200f%20Dynabeads.pdf). To analyse H2A.Z 
dynamics at the FT locus in response to temperature, we used 3-week-old seedlings 
of HTA11::HTA11:GFP grown at 17 and 27°C. ChIP was done using anti-GFP 
antibody (Abcam, ab290). To analyse PIF4 binding in Col-0 and arp6-1 back- 
grounds, respective genotypes with PIF4::PIF4:3XFLAG were grown on soil at 
22°C under short photoperiods for 3 weeks before samples were fixed by 
formaldehyde cross-linking. ChIP was performed using anti-Flag M2 affinity gel 
(Sigma A2220). Immuno-complexes were eluted using 3X Flag peptide (Sigma 
F4799) according to the manufacturer’s instructions. Immunoprecipitated DNA 
was eluted after reverse cross-linking by boiling at 95 °C for 1 min in the presence 
of 10% Chelex (BioRad laboratories) followed by treatment with Proteinase 
K. Oligonucleotides 3255 and 3256 for FT-15 region, 3613 and 3614 for FT-cl, 
3607 and 3608 for FT-c and 3261 and 3262 for FT-f were used for detecting 
PIF4 binding to the FT locus. As a positive control for PIF4 binding, 
At5g45280 was analysed using oligonucleotides 2857 and 2958. HSP70 was used 
as a negative control using oligonucleotides 1862 and 1865. To analyse PIF4 
binding at At4g39950, oligonucleotides 4240 and 4241 was used for region 1, 
and oligonucleotides 4246 and 4247 were used for region 2. Oligonucleotides 
1860 and 1861 were used for HSP70 as a negative control. Oligonucleotide 
sequences are provided in Supplementary Table 1. 


©2012 Macmillan Publishers Limited. All rights reserved 


[IER 


doi:10.1038/nature10897 


Role of corin in trophoblast invasion and uterine 
Spiral artery remodelling in pregnancy 


Yujie Cui'*+, Wei Wang'*+, Ningzheng Dong”**, Jinglei Lou'*, Dinesh Kumar Srinivasan'+, Weiwei Cheng“, Xiaoyi Huang“, Meng Liu’, 
Chaodong Fang”, Jianhao Peng!, Shenghan Chen!, Shannon Wu!, Zhenzhen Liu’, Liang Dong”, Yiging Zhou” & Qingyu Wul? 


In pregnancy, trophoblast invasion and uterine spiral artery remodel- 
ling are important for lowering maternal vascular resistance and 
increasing uteroplacental blood flow. Impaired spiral artery remodel- 
ling has been implicated in pre-eclampsia, a major complication of 
pregnancy, for a long time but the underlying mechanisms remain 
unclear’. Corin (also known as atrial natriuretic peptide-converting 
enzyme) is a cardiac protease that activates atrial natriuretic peptide 
(ANP), a cardiac hormone that is important in regulating blood pres- 
sure’. Unexpectedly, corin expression was detected in the pregnant 
uterus*. Here we identify a new function of corin and ANP in pro- 
moting trophoblast invasion and spiral artery remodelling. We show 
that pregnant corin- or ANP-deficient mice developed high blood 
pressure and proteinuria, characteristics of pre-eclampsia. In these 
mice, trophoblast invasion and uterine spiral artery remodelling were 
markedly impaired. Consistent with this, the ANP potently stimu- 
lated human trophoblasts in invading Matrigels. In patients with pre- 
eclampsia, uterine Corin messenger RNA and protein levels were 
significantly lower than that in normal pregnancies. Moreover, we 
have identified Corin gene mutations in pre-eclamptic patients, which 
decreased corin activity in processing pro-ANP. These results indicate 
that corin and ANP are essential for physiological changes at the 
maternal-fetal interface, suggesting that defects in corin and ANP 
function may contribute to pre-eclampsia. 

Pregnancy poses a serious challenge for maintaining normal blood 
pressure. Pregnancy-induced hypertension, a major cause of maternal 
and fetal deaths, occurs in approximately 10% of pregnancies”®. 
During pregnancy, the uterus undergoes profound morphological 
changes, including trophoblast invasion and spiral artery remodelling. 
In pre-eclampsia, impaired spiral artery remodelling is common, but 
the underlying mechanisms are unclear’*”°. Studies indicate that 
vascular growth factor receptors, angiotensin and oestradiol are 
involved in the disease’?™*. 

Corin is a cardiac protease that activates ANP, which is a cardiac 
hormone that regulates blood pressure and sodium homeostasis’*. In 
mice, lack of CORIN prevents ANP generation and causes hyper- 
tension'®. In humans, CORIN variants are associated with hyper- 
tension”. Interestingly, Corin expression was detected in the pregnant 
mouse* (Fig. 1A) and human uterus (Supplementary Fig. 1). As a trans- 
membrane protein, CORIN is expected to act at the expression sites, 
suggesting a possible function in the pregnant uterus. 

To understand the role of CORIN in pregnancy, we created a mouse 
model in which a Corin transgene was expressed under a cardiac 
promoter (Fig. 1B). The transgenic and Corin knockout mice were 
crossed to generate mice expressing Corin only in the heart (“knock- 
out/transgenic mice’; Fig. 1C, D). In knockout/transgenic mice, trans- 
genic Corin expression restored pro-ANP processing in the heart 
(Supplementary Fig. 2) and normalized blood pressure (Fig. 1E), 


indicating that cardiac CORIN was sufficient to maintain normal 
blood pressure in non-pregnant mice. 

In pregnant Corin knockout mice, blood pressure increased at 
approximately 17 days post coitus and rose further before returning 
to the non-pregnant blood pressure level after delivery (Fig. 1F), which 
resembled late gestational hypertension in pre-eclamptic women. In 
Corin knockout/transgenic mice, which were normotensive, blood 
pressure increased similarly during pregnancy (Fig. 1G), indicating 
that cardiac Corin expression did not prevent pregnancy-induced 
hypertension. The data also show that in these mice, hypertension in 
pregnancy was not due to pre-existing high blood pressure. As well as 
in the uterus, Corin mRNA was detected in the umbilical cord and 
placenta (Supplementary Fig. 3). To distinguish the role of maternal 
Corin from that of placental or other fetal organs, Corin knockout 
females were mated with either wild-type or knockout males. The 
resulting fetuses carried one or no copy of the functional Corin gene. 
Normally, enzymes that are encoded by one gene copy are able to 
function. As shown in Fig. 1H, pregnant Corin knockout females that 
were mated with either wild-type or knockout males had similarly 
increased blood pressure, indicating that lack of maternal, but not fetal, 
Corin caused hypertension in pregnancy. 

Proteinuria is a hallmark of pre-eclampsia. Wild-type, Corin knock- 
out and knockout/transgenic mice had similar urinary protein levels 
before pregnancy and at mid gestation. However, the levels increased 
in Corin knockout and knockout/transgenic mice at late gestation 
(Fig. 11), consistent with reported proteinuria in mouse models of 
pre-eclampsia'®. Ischaemic glomeruli, indicated by fewer red blood 
cells, were found in pregnant Corin knockout and knockout/transgenic 
mice (Fig. 1J, a—f) but not in non-pregnant mice (Supplementary Fig. 4). 
Periodic acid—Schiff staining revealed increased extracellular matrixes 
and collapsed glomerular capillaries in pregnant Corin knockout and 
knockout/transgenic mice (Fig. 1J, g—i). Electron microscopy showed 
narrow glomerular capillary lumens and thick basement membranes 
(Fig. 1K), suggesting endotheliosis and increased extracellular mat- 
rices. Additional pathological features such as necrotic cells and cal- 
cium deposits in the placental labyrinth also existed in these mice 
(Supplementary Fig. 5), indicating insufficient uteroplacental per- 
fusion. Consistent with this, Corin knockout and knockout/transgenic 
mice had smaller litters (7.1 + 2.3 (n = 28) and 6.8 + 2.7 (n = 28) pups 
per litter, respectively, versus wild-type mice, which had 9.1 + 1.2 
(n = 21) pups per litter; P< 0.001 in both cases). 

We examined embryos at embryonic day 12.5 (E12.5), an early time 
point before blood pressure increase in Corin knockout and knockout/ 
transgenic mice, and E18.5 (two days before delivery). Wild-type E12.5 
embryos showed obvious trophoblast invasion, shown by cytokeratin 
staining (Fig. 2a), and large vessels mostly in the deep decidua, shown 
by smooth-muscle «-actin (SMA) staining (Fig. 2b), indicating that 
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Figure 1 | Hypertension, proteinuria and renal pathology in pregnant 
Corin knockout and knockout/transgenic mice. A, Corin mRNA expression 
in mouse uteruses. B, Corin transgenic (Tg) construct. C, D, Western blot 
analysis of CORIN protein in wild-type (WT), Corin knockout (KO) and 
knockout/transgenic mice. E, Blood pressure (BP, mean + s.d.) in non- 
pregnant females. F, G, Blood pressure increased in Corin knockout (F) and 
knockout/transgenic (G) mice in pregnancy. Data are mean + s.d. *P < 0.05 
versus WT of the same time point. tP < 0.05 versus non-pregnant level of the 
same genotype. H, Similar blood-pressure changes in Corin knockout females 
mated with knockout or WT males. I, Late gestational proteinuria in Corin 


smooth muscles in the superficial decidua were replaced by invading 
trophoblasts. In contrast, trophoblast invasion in Corin knockout and 
knockout/transgenic embryos was markedly reduced (Fig. 2a) and 
smaller arteries were found in both superficial and deep decidua 
(Fig. 2b). In E18.5 wild-type embryos, more abundant trophoblasts 
were found in the decidua and myometrium compared with those in 
Corin knockout and knockout/transgenic mice (Fig. 2c, d). By haema- 
toxylin and eosin staining, larger and more abundant decidual spiral 
arteries were observed in wild-type than in Corin knockout or knock- 
out/transgenic mice (Fig. 2e). Figure 2f-h shows strong cytokeratin 
(trophoblasts) staining but weak von Willebrand factor (endothelial) 
and SMA (smooth muscle) staining in wild-type decidual and myo- 
metrial arteries. These data indicate that trophoblast invasion and 
spiral artery remodelling were impaired in Corin knockout and knock- 
out/transgenic mice, and that this defect occurred before blood pres- 
sure increased in these mice. 

CORIN activates ANP in the heart’® but it was unknown whether the 
CORIN function in pregnancy was also mediated by ANP. Pro-ANP is 
expressed in the non-pregnant and pregnant uterus (Supplementary 
Fig. 6). If CORIN acts on pro-ANP to promote trophoblast invasion 
and spiral artery remodelling, thereby preventing hypertension in preg- 
nancy, ANP (also known as Nppa) and Corin knockout mice should 
have similar phenotypes. ANP knockout mice are hypertensive (Fig. 3a) 


D16-18 


knockout and knockout/transgenic mice. Data are mean + s.d.**P <0.01,n = 
7 or 8 per group. J, a-i, Renal ischaemia in pregnant Corin knockout and 
knockout/transgenic mice. E18.5 sections are stained with haematoxylin and 
eosin (H&E), Masson trichrome or periodic acid—Schiff (PAS). Scale bar, 20 
jum. Red blood cells (yellow arrows) and open capillaries (red arrows) in WT 
glomeruli are shown. K, Narrow glomerular capillary lumen (L) and thick 
basement membranes (red arrows) in Corin knockout and knockout/ 
transgenic mice at E18.5 shown by electron microscopy. GAPDH, 
glyceraldehyde 3-phosphate dehydrogenase; NP, non pregnant; pA, poly A. 


but their blood pressure was not monitored during pregnancy’’. We 
found similarly increased blood pressure in pregnant ANP knockout 
mice (Fig. 3b). The mice also had late gestational proteinuria (Fig. 3c) 
and smaller litters (4.4 + 1.7 (n = 25) versus wild-type, 9.1 + 1.2 
(n = 21) pups per litter, P < 0.001). By immunostaining, impaired 
trophoblast invasion and smaller spiral arteries were observed in 
E12.5 embryos (Fig. 3d, e). In E18.5 embryos, ANP knockout mice 
had far fewer trophoblasts (Fig. 3f, g) and smaller arteries (Fig. 3h) in 
the decidua and myometrium than those in wild-type mice. Consistent 
with this, weak cytokeratin-staining but strong von Willebrand factor- 
staining were found in arteries in ANP knockout mice (Fig. 3i). Thus, 
ANP and Corin knockout mice had very similar phenotypes, indicating 
that the role of CORIN in pregnancy is probably mediated by ANP. 
In the heart, CORIN produces ANP, which then regulates blood 
pressure by promoting natriuresis and vasodilation’. Here we found 
that lack of CORIN and ANP impaired trophoblast invasion and spiral 
artery remodelling, which was not rescued by cardiac Corin expression 
in Corin knockout/transgenic mice. ANP is known to relax vascular 
smooth muscles. Recently, ANP and its downstream cyclic GMP- 
dependent protein kinase were shown to be important in angiogenic 
processes by promoting endothelial regeneration”””’. Thus, ANP may 
function locally to remodel uterine arteries. Our results also indicate 
that ANP may directly promote trophoblast invasion (Fig. 4a), and we 
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Figure 2 | Impaired trophoblast invasion and spiral artery remodelling in 
Corin knockout and knockout/transgenic mice. a, b, E12.5 embryo sections 
were stained for trophoblasts (a) or smooth muscles (b). Fetal (F) and maternal 
(M) sides are indicated. Boxed areas in top panels are shown at a higher 
magnification (200). c, E18.5 embryo sections were stained for trophoblasts. In 
lower panels (< 100), yellow lines show the decidua and myometrium boundary. 


therefore tested this idea. We found that ANP markedly stimulated 
human trophoblasts to invade Matrigels (Fig. 4b) (Supplementary 
Fig. 7a). In these cells, ANP receptor (also known as atrial natriuretic 
peptide receptor 1) mRNA expression was confirmed (Supplementary 
Fig. 7b) and ANP-stimulated intracellular cGMP production was 
detected (Fig. 4c) (Supplementary Fig. 7c). 

Our findings emphasize the importance of local ANP production by 
CORIN, which acts on trophoblasts and vascular cells in the uterus. 
Because heart-derived ANP circulates inside the vessel, our model may 
explain why cardiac CORIN failed to promote trophoblast invasion and 
uterine artery remodelling, as shown in Corin knockout/transgenic 
mice. To verify this hypothesis, we quantified Corin mRNA and 
protein in human uteruses by polymerase chain reaction with reverse 
transcription (RT-PCR) and enzyme-linked immunosorbent assay 
(ELISA). The levels were low in non-pregnant women but increased 
in pregnant women (Fig. 4d, e). In pre-eclamptic women, the levels 
were significantly lower than in normal pregnancies. Similar results 
were found by immunostaining (Fig. 4d and Supplementary Fig. 8). 
Consistent with this, pro-ANP levels in uterine tissues were signifi- 
cantly higher in pre-eclamptic women than in normal pregnant women 
(Fig. 4f), indicating that reduced uterine Corin expression impaired 
pro-ANP processing in these patients. Corin is a membrane-bound 
protein*'’, and recent studies showed that CORIN can be shed from 
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d, Quantitative data (mean + s.d.) of cytokeratin (Cytok) staining. e, Fewer and 
smaller decidual spiral arteries (arrows) in H&E-stained E18.5 Corin knockout 
and knockout/transgenic embryos. f-h, Co-staining of SMA, von Willebrand 
factor (vWF), cytokeratin and nuclei in E18.5 embryos. Red arrows indicate 
cytokeratin (green) signals, white arrows indicate von Willebrand factor (red) 
signals and orange arrows indicate mixed (yellow) signals. 
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cardiomyocytes and that soluble CORIN is found in human plasma 
We found that plasma CORIN levels were higher in pre-eclamptic 
patients than non-pregnant or normal pregnant women (Fig. 4g). 
Thus, CORIN levels in plasma did not reflect the levels in tissues, 
indicating that plasma CORIN was probably derived from the heart, 
where Corin expression increased in response to high blood volume 
and high blood pressure in pregnancy. These results provide further 
support for a local function of CORIN in the pregnant uterus. 

We next sequenced the CORIN gene” in pre-eclamptic patients and 
identified a mutation that alters Lys to Glu at position 317 in low- 
density lipoprotein receptor repeat 2 in one woman (Fig. 4h, j) and 
another mutation altering Ser to Gly at position 472 in the frizzled 2 
domain in two women from the same family who had pre-eclampsia 
(Fig. 4i, j). In functional studies, Lys317Glu and Ser472Gly mutations 
did not affect CORIN expression in HEK293 cells but markedly 
reduced CORIN activity in processing pro-ANP (Fig. 4k-n). The data 
were consistent with previous findings that Low-density lipoprotein 
receptor repeats and frizzled domains are critical for CORIN activity”, 
suggesting that the mutations may impair CORIN function in the 
patients, thereby contributing to pre-eclampsia. Interestingly, 
CORIN variants in the frizzled 2 domain that impaired CORIN func- 
tion have been reported in African American people’””®, a high-risk 
population for pre-eclampsia. 
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Figure 3 | Hypertension, proteinuria and uteroplacental pathology in 
pregnant ANP knockout mice. a, Blood pressure (mean + s.d.) in non- 
pregnant females, **P < 0.01. b, Elevated blood pressure (mean + s.d.) in 
pregnant ANP knockout mice. +P < 0.05 versus non-pregnant level. 

c, Gestational proteinuria in ANP knockout mice. Data are mean = s.d. 

d, e, Impaired trophoblast invasion and smooth muscle remodelling in E12.5 
embryos stained for cytokeratin (d) or SMA (e). Boxed areas in top panels are 
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shown at a higher magnification (200). f, Impaired trophoblast invasion in 
E18.5 embryos stained for cytokeratin. g, Quantitative data (mean + s.d.) of 
cytokeratin staining in E18.5 ANP knockout embryos. h, Impaired decidual and 


myometrial artery remodelling (arrows) 


in H&E-stained E18.5 ANP knockout 


embryos. i, Co-staining of cytokeratin, von Willebrand factor and nuclei in 
E18.5 ANP knockout embryos. Red arrows indicate cytokeratin (green) signals 
and white arrows indicate von Willebrand factor (red) signals. 
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Figure 4| ANP-stimulated human 
trophoblast invasion, and impaired 
uterine Corin expression and Corin 
mutations in pre-eclamptic 
patients. a, A model showing that 
CORIN-produced ANP in the 
pregnant uterus promotes 
trophoblast invasion (red arrows) 
and vascular-wall remodelling (black 
arrows). b, c, ANP-stimulated 
human BeWo trophoblasts and 
primary trophoblasts in Matrigel 
invasion (b) and intracellular cGMP 
production (c). Data are mean + s.d. 
*P < 0.05; **P < 0.01 versus control. 
d-f, Corin mRNA (d) and protein 
(e), and pro-ANP levels (f) in human 
uterus samples. Horizontal lines 
indicate mean values. g, Plasma- 
soluble CORIN levels (mean = s.d.) 
in pre-eclamptic patients and normal 
controls. h-j, CORIN gene mutations 
causing Lys317Glu (h) and 
Ser472Gly (i) changes in CORIN 

(j). k, 1, Expression of Lys317Glu and 
Ser472Gly mutants in HEK293 cells 
(top panels). Vector, WT CORIN 
and inactive CORIN Arg801Ala and 
Ser985Ala mutants were controls. 
Lys317Glu and Ser472Gly mutations 
reduced pro-ANP processing activity 
(bottom panels). m, n, Quantitative 
data (mean + s.d.) from three 
experiments or more. Fz, frizzled; 
LDLR, LDL receptor; SR, scavenger 
receptor; TM, transmembrane. 
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Previously, high levels of plasma pro-ANP or ANP were detected in 
pre-eclamptic patients””’*. As shown by our plasma soluble CORIN 
data, plasma protein levels may not reflect those in tissues. Taken 
together, our data show a novel local function of CORIN and ANP 
in promoting trophoblast invasion and spiral artery remodelling to 
prevent hypertension in pregnancy. The data suggest that impaired 
Corin expression or function in the pregnant uterus may be an import- 
ant mechanism underlying pre-eclampsia. Studies to better under- 
stand impaired uterine Corin expression in pre-eclamptic patients 
may help to develop new strategies to enhance the CORIN-ANP 
pathway and prevent or treat this life-threatening disease. 


METHODS SUMMARY 


Corin and ANP knockout mice have been described previously'®”’. Transgenic 
mice with cardiac Corin expression were generated using a heart-specific pro- 
moter. Blood pressure was measured by radiotelemetry’®. Tissue sections from 
non-pregnant and pregnant mice were stained with haematoxylin and eosin, 
Masson’s trichrome, periodic acid-Schiff or von Kossa, or immunostained with 
antibodies against cytokeratin, SMA, von Willebrand factor or CORIN. Renal 
sections were also examined by electron microscopy. Trans-well invasion assay 
was carried out with human primary villous trophoblasts (ScienCell) and tropho- 
blastic JEG3, BeWo, JAR cell lines (ATCC) in Matrigel Invasion Chambers (BD 
Biosciences). ANP-stimulated cGMP production in trophoblasts was assayed in 
96-well plates. Intracellular cGMP levels were determined using an enzyme 
immunoassay kit (Enzo Life Sciences). Corin levels in human blood and uterus 
tissue samples were measured using ELISA”. Pro-ANP levels in human uterus 
tissues were also measured using ELISA (Alpco Diagnostics). Corin gene exons™ 
from pre-eclamptic patients were PCR-amplified and sequenced directly. Corin 
gene mutations that were identified were studied by expressing mutant CORIN 
proteins in HEK293 cells and testing their activities in pro-ANP processing assays, 
as described previously”®. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Knockout and transgenic mice. Corin knockout mice were described previously’®. 
ANP knockout mice (B6.129P2-Nppa “”!/""/J)!° were from the Jackson Laboratory. 
To make transgenic mice expressing Corin in the heart, the full-length mouse Corin 
cDNA was inserted into a construct driven by the mouse «-myosin heavy chain 
(a-MHC) promoter. Pro-nuclear microinjection and breeding of transgenic mice 
were carried out at the Case Western Reserve University Transgenic Core. Corin 
knockout and transgenic mice were crossed to generate knockout/transgenic mice. 
Littermates were used as controls. The animal study was conducted in accordance 
with the National Institutes of Health guidelines and approved by the Institutional 
Animal Care and Use Committee at the Cleveland Clinic. 

Blood-pressure monitoring. Radiotelemetry was used for real-time blood- 
pressure monitoring in conscious and unrestrained mice’®. Female mice (8-12 
weeks old) were chronically instrumented in the left carotid artery with a PA-C10 
device (Data Sciences International) and rested for at least 7 days to recover from 
the surgery. The mice were mated and checked for vaginal plugs to establish 
gestation timing. The day on which a plug was observed was defined as EO.5. 
The mating mice were homozygous except for those in the fetal testing experi- 
ment. Telemetry receivers (model RPC-1) were placed under individual cages for 
data acquisition using the Dataquest A.R.T. 4.0 Gold System (Data Sciences 
International). Data presented were from continuous recording of at least 6 h 
per day (10:00 to 16:00). 

Urinary protein measurement. Urine samples were collected from non-pregnant 
mice and pregnant mice at mid (8-10 days post coitus) and late (16-18 days post 
coitus) gestational stages. Urinary protein levels were measured using a colorimetric 
assay based on a modified Bradford method (Bio-Rad). 

RT-PCR, western blot analysis and ELISA. Total RNAs were isolated from 
cultured cells or mouse and human tissues using TRIzol reagent (Invitrogen) or 
an RNeasy kit (Qiagen), and were used to synthesize the first strand cDNAs. RT- 
PCR was carried out using oligonucleotide primers that were specific for the 
mouse or human CORIN, mouse ANP, or human ANP receptor genes. 
Quantitative RT-PCR for human CORIN mRNA expression in uterus tissues 
was carried out using the PRISM 7500 System (Applied Biosystems). The B-actin 
gene was used as an internal control. Quantitative RT-PCR for mouse ANP 
mRNA in uteruses was carried out using the iCycler system (Bio-Rad). For western 
blot analysis of CORIN protein, membrane fractions from tissue homogenates 
were isolated by ultracentrifugation, as described previously”. Proteins were 
analysed by SDS-polyacylamide agarose gel electrophoresis (SDS-PAGE) and 
western blot using a polyclonal antibody (Berlex Biosciences). Western blot 
analysis of pro-ANP in heart samples was carried out using a polyclonal antibody 
(Santa Cruz). Processing pro-ANP by CORIN in transfected cells was analysed by 
western blot analysis, as described previously*’. Pro-ANP in human uterus tissues 
was measured by an amino-terminal (NT) pro-ANP ELISA kit from Alpco 
Diagnostics. Human CORIN in uterus tissues or plasma was measured by 
ELISA, as described previously”’. 

Histology and immunohistochemistry. Tissues were fixed with 4% para- 
formaldehyde and embedded in paraffin. Sections were stained with H&E, 
Masson’s trichrome, PAS or von Kossa. For immunohistochemical or immuno- 
fluorescent analysis, antibodies against SMA (Sigma-Aldrich), von Willebrand 
factor (Sigma-Aldrich) and cytokeratin (Dako) were used to label smooth muscle 
cells, endothelial cells and trophoblasts, respectively. For human CORIN, an 
antibody from Berlex Biosciences was used. Secondary antibodies were conjugated 
with horseradish peroxidase or Alexa Fluor 488 (green) or Alexa Fluor 594 (red) 
(Invitrogen). Tissue sections were mounted with or without DAPI-containing 
(blue) mounting medium (Dako). For ANP expression in mouse uterus tissues, 
a polyclonal antibody from Millipore was used. Control sections were treated 
similarly but without the primary antibodies. Photographs were taken with a light 
or fluorescent microscope equipped with a digital camera (Olympus). Data are 
from experiments using five or more mice per study group. 

For immunohistochemical analysis of trophoblast invasion in mouse embryos, 
tissue samples from at least five mice per group, and at least two implant sites per 
mouse were used. Serial sections (>50 per embryo) of 5 tim in thickness were 
prepared. The position of the maternal artery was used as a guide to orient section 
positions. At least 4-6 sections from the centre of the placenta of each embryo were 
used for immunohistochemical analysis. Slides that were stained for cytokeratin 
were examined by two individuals. The sections that showed the deepest tropho- 
blast invasion are presented. These sections were also analysed by ImagePro soft- 
ware to quantify cytokeratin staining. For each section that was analysed, the entire 
area of the decidua and myometrium was scanned by the software. 

Electron microscopy. Kidneys from pregnant mice at E18.5 were fixed in 3% 
glutaraldehyde, treated with 1% osmium tetroxide and embedded in an Araldite- 
Epon mixture. Semi-thin sections (0.6 jim) were prepared and examined with a 
transmission electron microscope (JEOL JEM-1210) at the Lerner Image Core of the 
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Cleveland Clinic. Data are from experiments using at least three mice per study 
group. 

Trans-well invasion assay. Human trophoblastic JEG3, BeWo and JAR cells from 
the American Type Culture Collection were cultured in Minimum Essential 
Medium (JEG3), RPMI1640 (JAR) and F-12K (BeWo) medium, respectively, with 
10% FBS at 37 °C. Primary human villous trophoblasts from ScienCell Research 
Labs (Carlsbad) were cultured in the Trophoblast Medium (ScienCell) with 10% 
FBS. Transwell invasion assays were carried out using the BioCoat Growth Factor 
Reduced Matrigel Invasion Chambers (pore size of 8 jum) and control inserts (no 
Matrigel coating) (BD Biosciences) in 24-well plates. Culture medium containing 
human ANP (Calbiochem) was added to the bottom wells, and cell suspension 
(5 X 10*) was added to the top wells and incubated at 37 °C for 24h. Non-invading 
cells were removed from the upper surface of the Matrigel layer by gentle 
scrubbing. The cells on the lower surface of the membrane were stained using 
Diff-Quick staining solutions. The membranes were excised and mounted onto 
glass slides. Invasion indices were determined by counting the number of stained 
cells on the membrane under a light microscope. The assay was carried out in 
duplicate in at least three independent experiments. 

cGMP assay. ANP-stimulated intracellular cGMP production assay was per- 
formed with JEG3, JAR and BeWo cells and primary human trophoblasts using 
a method described previously’. The cells were grown in 96-well plates. Confluent 
cells were washed once with serum-free medium. Human ANP was added to 
serum-free medium and incubated with cells at 37 °C for 30 min. In these experi- 
ments, ANP was more potent in stimulating intracellular cGMP production when 
serum-free medium was used (data not shown). The cells were lysed with 0.1 M 
HCL. Intracellular cGMP levels in ANP-stimulated cells were determined using an 
EIA kit (Enzo Life Sciences). Each experimental condition was assayed in duplicate 
in at least three independent experiments. 

Human blood and tissue samples. The study was approved by local ethics 
committees and participants gave informed consent. Women of normal preg- 
nancy or with pre-eclampsia, and age-matched non-pregnant normal controls 
who underwent routine medical check-ups were recruited. All participants were 
ethnic Han Chinese. Hypertension was defined as diastolic pressure >90 mm Hg 
and/or systolic pressure >140 mm Hg on at least two occasions. Pre-eclampsia 
was defined as hypertension that appeared after 20 weeks of gestation with 
proteinuria (>300 mg urinary protein per 24 h). Patients with chronic hyper- 
tension, chronic kidney disease, diabetes and heart disease were excluded. Uterus 
tissues were obtained during caesarean sections in pregnant women or operations 
for uterine leiomyoma in non-pregnant women. Clinical characteristics of women 
who provided blood and those who provided uterus tissue samples are summarized 
in Supplementary Tables 1 and 2, respectively. 

CORIN gene sequences in patients. Blood samples from 56 patients with pre- 
eclampsia were collected into tubes containing EDTA as an anticoagulant. 
Genomic DNA was extracted from white blood cells using the QlAamp DNA 
Mini kit (Qiagen) and used in PCR to amplify exon sequences of the CORIN 
gene’. PCR products were used for direct DNA sequencing. Mutations that were 
identified were verified by independent PCR and DNA sequencing. Additional 
PCR and DNA sequencing were carried out with DNA samples from more than 
100 normal controls to verify that mutations that were identified in patients did 
not exist in the normal population. 

Expression and functional analysis of Corin mutants. Plasmids expressing 
human wild-type Corin and two inactive mutants Arg801 Ala and Ser985Ala, in which 
the activation cleavage site and catalytic site residues were mutated, respectively, were 
described previously**. Plasmids expressing Corin mutants Lys317Glu or Ser472Gly 
were constructed by PCR-based mutagenesis. Recombinant CORIN proteins that 
were expressed by these plasmids contained a carboxy-terminal V5 tag to be detected 
by an anti-V5 antibody (Invitrogen)**. Plasmids were transfected into HEK293 cells 
using Lipofectamine 2000 (Invitrogen). Cells were lysed and proteins were analysed 
by western blot using an anti-V5 antibody. To analyse the function of CORIN, 
recombinant human pro-ANP in conditioned medium was added to HEK293 cells 
expressing Corin wild-type or mutants and incubated at 37 °C for 2 h. Pro-ANP and 
ANP in the medium were immunoprecipitated and analysed by western blot. Protein 
bands on X-ray films were scanned by densitometry. The percentage of pro-ANP to 
ANP conversion was calculated as described previously”°. 

Statistical analysis. Results are presented as mean + s.d. Differences between two 
groups were analysed with the Student’s t-test. Data involving more than two 
groups were analysed by analysis of variance followed by the Tukey multiple 
comparison test. Comparisons for Corin mRNA and protein and pro-ANP levels 
in human uterus samples were carried out using the Mann-Whitney—Wilcoxon 
test. A P value of less than 0.05 was considered statistically significant. 
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Genome-wide protein-DNA binding dynamics 
suggest a molecular clutch for transcription 


factor function 


Colin R. Lickwar'*, Florian Mueller?+*, Sean E. Hanlon'}, James G. McNally? & Jason D. Lieb! 


Dynamic access to genetic information is central to organismal 
development and environmental response. Consequently, genomic 
processes must be regulated by mechanisms that alter genome 
function relatively rapidly’ *. Conventional chromatin immuno- 
precipitation (ChIP) experiments measure transcription factor 
occupancy’, but give no indication of kinetics and are poor 
predictors of transcription factor function at a given locus. To 
measure transcription-factor-binding dynamics across the genome, 
we performed competition ChIP (refs 6, 7) with a sequence-specific 
Saccharomyces cerevisiae transcription factor, Rap] (ref. 8). Rap1- 
binding dynamics and Rap1 occupancy were only weakly correlated 
(R’ = 0.14), but binding dynamics were more strongly linked to 
function than occupancy. Long Rap1 residence was coupled to tran- 
scriptional activation, whereas fast binding turnover, which we 
refer to as ‘treadmilling’, was linked to low transcriptional output. 
Thus, DNA-binding events that seem identical by conventional 
ChIP may have different underlying modes of interaction that lead 
to opposing functional outcomes. We propose that transcription 
factor binding turnover is a major point of regulation in determin- 
ing the functional consequences of transcription factor binding, 
and is mediated mainly by control of competition between tran- 
scription factors and nucleosomes. Our model predicts a clutch-like 
mechanism that rapidly engages a treadmilling transcription factor 
into a stable binding state, or vice versa, to modulate transcription 
factor function. 

The diverse biological functions of Rap] (ref. 9) make it an excellent 
model for testing the hypothesis that binding dynamics are important 
for transcription factor function (Supplementary Fig. 1). We developed 
a strain with two copies of RAP1. One copy of RAP was tagged with a 
3X Flag epitope and was constitutively expressed from the endogenous 
RAPI promoter. A second copy of RAPI was tagged with a 9X Myc 
epitope and was controlled by a weakened galactose-inducible pro- 
moter, GALL (an attenuated version of the GAL1 promoter) (Fig. 1a). 
This strain showed no growth defects in inducing (2% galactose) or 
non-inducing (2% dextrose) conditions (Fig. 1b and Supplementary 
Fig. 2). To avoid cell-cycle and DNA replication effects, for the dura- 
tion of the experiment the strain was arrested in G1 with alpha factor’. 
The induced Rap1 protein isoform could be detected as early as 30 min 
after galactose induction (Fig. 1c). The ratio of Rap1 isoforms pro- 
vided an estimate of the nucleoplasmic pool of Rap1 molecules 
(Fig. 1d). We then performed Myc and Flag ChIP experiments inde- 
pendently from extract corresponding to each of 10 time points (0, 10, 
20, 30, 40, 50, 60, 90, 120 and 150 min after induction). We also 
performed ChIP to measure total Rapl occupancy using a Rapl- 
specific antibody at 0 and 60 min. DNA fragments enriched in the 


ChIPs were detected on whole-genome-tiling 12-plex microarrays 
containing 270,000 probes per subarray, with an average probe 
interval of 41 bp and an average probe length of 54 bp (Sup- 
plementary Fig. 3). The entire time-course experiment was performed 
in duplicate. (Procedural details can be found in the Methods.) 

After induction, Rap1—Myc was incorporated at targets where Rap1 
had previously been shown to bind*"° (Fig. 2a, b), indicating that the 
system was functioning as designed. The increase in Rap] protein 
caused by the induction of the competitor did not cause an increase 
in the overall occupancy at the measured Rap] sites (Fig. 2c, d and Sup- 
plementary Figs 4 and 5). As Rap1—Myc ChIP occupancy increased at 
sites of Rap] binding, Rap1-Flag occupancy decreased coordinately 
(Fig. 2c, d and Supplementary Fig. 4). Thus, Rapl1—Myc is competing 
specifically with Rap1—Flag at each locus, and Rap1-Myc binding is 
not the result of cooperativity or additional Rap1 binding locations. 

To interpret our data, we developed a model to determine turnover 
rates of Rapl by modifying a fitting algorithm used previously to 
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Figure 1 | Development of transcription factor competition ChIP yeast 
strains. a, Schematic of the Rap] competition-ChIP yeast strain. b, Growth 
comparison of competition yeast strain and wild-type in inducing (2% 
galactose) and non-inducing (2% dextrose) conditions. c, Western blot analysis 
using an antibody against Rap1 (y-300). Strains containing only a Rap1—Myc or 
only Rap1-Flag copy are shown to the right to indicate the size of isoform- 
specific bands. The actin loading control is shown below. d, To estimate the 
dynamics of induction, the ratio of induced Rap1—Myc and constitutive Rap1- 
Flag protein is plotted. Data are from two technical replicates of two 
independent time-course replicates. Error bars represent s.e.m. 
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Figure 2 | Rap1-bound sites exhibit distinct replacement dynamics. a, A 
Rap1 turnover experiment over a 30-kb region of chromosome II (Chr II). Rap1 
motifs and peaks are shown. b, Average log, Myc/Flag ratio values for all Rap1 
targets (red) increase relative to non-Rap1 targets (blue). c, Rap1-Myc competes 
with Rap1—Flag for binding. Average single-channel intensity for Rap1-Myc 
and Rap1-Flag for a single probe (CHR15FS000978891) in the promoter of the 
TYE7 gene shows that the increase in Rap1—Myc is coincident with the loss of 
Rap1-Flag. d, Total Rap] occupancy does not change during the time course. 
Average total Rap1 occupancy (occ.) (log, ratio of Rap1 immunoprecipitation 


measure histone H3 turnover®. Under our experimental conditions, 
the extracted turnover rate for a transcription factor at a binding site is 
equivalent to its dissociation rate, which allows us to measure the 
residence time (Supplementary Figs 6-8 and Supplementary Text). 
Our experimental system can quantify binding events that have an 
apparent duration of 500 s or longer (Supplementary Figs 6 and 7, 
and Supplementary Text). Using our ChIP data and model we 
measured residence time of Rap1 at 439 peaks of Rap1 enrichment 
genome-wide, and at the 26 uniquely mappable telomeres (Fig. 2e-h, 
Supplementary Fig. 9 and Supplementary Text). Rapl occupancy 
correlated only modestly with Rap1 residence (R* = 0.14; Spearman 
rank correlation = 0.37) (Fig. 2h), and distinct dynamics of Rap1-Myc 
incorporation were observed at different genomic loci (Fig. 2e-h). 
Thus, residence time and occupancy are distinct measurements, and 
our system was capable of distinguishing Rap1 turnover kinetics at 
different loci in the same experiment. 

We found that efficient transcriptional activation was associated 
with stable Rap1 binding, whereas lower transcript production was 
associated with treadmilling, despite similar levels of Rap1 occupancy. 
Long Rap1 residence times occurred at ribosomal protein gene 
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Rap1 residence time (min) 


(y-300) to input ratio z score) at Rap] targets at time 0 is plotted against that at 60 
min. e, Average log, Myc/Flag ratios for the promoter of ribosomal protein gene 
RPL29B (red points). The model fit for the residence time parameter that best fits 
these data is shown (black line). f, Colorimetric representation of log, Myc/Flag 
ratios for all 465 Rap1 targets, sorted by the initial (normalized) log, Myc/Flag 
ratio. g, For each site in f, the log, Myc/Flag ratio predicted by our residence 
model, based on the calculated residence time, is shown. h, Rap] occupancy 
(time 0 z score) versus Rap1 residence for 465 Rap1 targets (R? = 0.14, 0.37; 
Spearman rank correlation). 


promoters, which are very highly transcribed and strongly activated 
by Rapl (refs 6, 11, 12) (Fig. 2e, h). In contrast, Rap1 binding to non- 
ribosomal protein targets and to the infrequently transcribed telomeric 
and subtelomeric Rap] sites was characterized by fast turnover (Fig. 2h 
and Supplementary Figs 10 and 11). Stable Rap1 binding seems to 
support higher mRNA production through more efficient recruitment 
of the RNA polymerase II machinery’ (Fig. 3a). Genes with stable 
Rap1 binding at their promoters indeed exhibited high levels of RNA 
polymerase II association’ (Fig. 3a), high transcription initiation 
rates‘? (Fig. 3b) and high mRNA levels (Fig. 3c). Rap1 occupancy 
does not correspond as strongly (Fig. 3a—c; right panels). TATA- 
binding protein (TBP) turnover’ is also slow at ribosomal protein 
genes, indicating that slow transcription-factor-binding dynamics 
may be a hallmark of efficient transcription initiation’. 

We next examined possible mechanisms for the locus-specific differ- 
ences in Rap1 residence time. Nucleosomes are a major regulator of 
genome accessibility’, so we examined the relationship between histone 
modification and Rap1-binding dynamics’*”*. Sites of long Rap] res- 
idence were strongly correlated with sites of enrichment for the histone 
acetyltransferases Gcn5 and particularly Esal (ref. 17) (Fig. 3d, e). 
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Figure 3 | RNA polymerase II recruitment, mRNA production, and histone 
acetyltransferase recruitment is associated with long Rap] residence. 

a-d, Left, Rap1 residence (res.) time is plotted on the x axis. Right, Rap1 
occupancy is plotted on the x axis. In both panels, the following are plotted on 
the y axis: RNA polymerase II (RNA Pol II) occupancy (a)°; the number of 
mRNA transcripts per h (b) (ref. 11); mRNA levels at time 0 (c); and histone 
acetyltransferase Esal occupancy z scores (d)"”. r, is the Spearman correlation 


Nucleosome instability reinforced by Gcn5 and Esal (members 
of SAGA and NuA4, respectively) may stabilize Rap1 binding by 
reducing competition with nucleosomes'*"’. Other indicators of active 
promoters—including H3K4me3, occupancy by the bromodomain 
protein Bdfl (similar to mammalian TAF1)”, and acetylation of 
H3K9, H4 and H3K14—were also more strongly associated with 
Rap1 residence time than with Rap1 occupancy (Fig. 3e). 

In general, sites that are bound by Rap1 are strongly depleted of 
nucleosomes”'. However, the binding dynamics data allowed us to 
appreciate a more complex relationship. We grouped Rap1-bound loci 
into four categories based on their measured Rap1 residence time: 
longest, long, short, and shortest. We then aligned the Rap1 motifs 
in each category and plotted nucleosome occupancy relative to the 
motif position, reasoning that nucleosomes in direct proximity to 
the DNA motif bound by Rap1 would have a strong influence on 
Rap] residence times”’. As expected, strong nucleosome depletion 
was centred on the Rap1 motif (Fig. 4a). However, as Rap1-binding 
turnover increased, nucleosome depletion was correspondingly less 
pronounced. Thus, not all highly occupied Rap] sites are equally 
depleted of nucleosomes in vivo. Instead, a subset of loci at which 
Rap1 occupancy is high but binding turnover is also high (treadmill- 
ing) are associated with higher nucleosome occupancy (Supplemen- 
tary Fig. 11b). No consistent relationship is apparent when Rap1 targets 
are grouped by occupancy, as measured by traditional ChIP (Fig. 4a). 

We next examined nucleosome occupancy on naked DNA in the 
absence of Rap1 or any protein cofactors*’. Notably, DNA-encoded 


(Rap1 res. — Rap1 occ.) 


0.15 0 0.35 


Spearman correlation 


-0.4 0.2 0.6 


value. e, Colorimetric representation of the Spearman rank correlation between 
various genomic data sets and Rap] occupancy (right) and Rap1 residence 
(centre), ordered by the magnitude of the difference between the residence and 
occupancy correlations for each comparison (left). WCE, whole-cell extract; 
PBM, protein-binding microarray. Telomeric targets are excluded from 
analysis (Supplementary Text). 


nucleosome occupancy measured in vitro is low only for the class of 
Rap1 targets with the most stable binding (Fig. 4b and Supplementary 
Fig. 11b). This pattern was not recapitulated when Rap] targets were 
sorted by occupancy (Fig. 4b). This suggests that the nucleosome 
behaviour surrounding transcription factor motifs is at least partially 
encoded in DNA", and that this DNA-encoded nucleosome occu- 
pancy can influence the binding dynamics of transcription factors, 
and thereby affect functional outputs (Supplementary Fig. 1la-c). 

We sought further evidence supporting direct competition between 
nucleosomes and Rapl. We compared histone H3 turnover® to Rap1 
residence times and found that loci with long Rap1 residence times also 
had relatively slow H3 turnover. Similarly, histone H3 molecules that 
treadmill are found almost exclusively at sites of Rap1 treadmilling 
(Fig. 4c). Rap1—-nucleosome interactions isolated by immunoprecipitat- 
ing Rapl after MNase digestion” were also detected more often at 
treadmilling sites (Fig. 4d). Further evidence for competition is sup- 
ported by a marked increase in nucleosome occupancy directly over 
Rap1 motifs after Rap1 depletion” at treadmilling loci, but not at loci 
with stable Rap1 binding (Fig. 4e). These relationships provide evidence 
for direct competition between Rap1 and nucleosomes. 

Given that high DNA-encoded nucleosome occupancy is associated 
with rapid Rap] turnover (Fig. 4b), it is reasonable to expect that 
differences in the strength of the DNA motif bound by the transcrip- 
tion factor would also influence turnover. To test this, we examined the 
relationship between Rapl turnover and experimentally measured 
in vitro Rap] affinity at each locus™. For sites with longer Rap] residence, 
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Figure 4 | Evidence for competition between Rap1 and nucleosomes. 

a, Colorimetric representation of in vivo nucleosome occupancy centred on 
Rap1-binding motifs. Loci are ordered by Rap1 residence time (top) or Rap1 
occupancy (bottom). The total number of Rap] targets in each group is shown 
in parentheses. To the right are plots of the average nucleosome occupancy for 
each group centred on the Rap1 motif. Targets with several Rap1 motifs are 
represented by one randomly chosen motif. b, Same as (a) but for in vitro 
nucleosome occupancy. c, Histone H3 turnover versus Rap1 residence for 
ribosomal protein genes (red) and other targets (blue). d, The number of Rap1- 
nucleosome interactions” detected within each Rap1 target peak boundary ona 
logo scale. e, Relative change in nucleosome occupancy following Rap] 
depletion” centred on Rap1 motifs grouped by residence time. A value of zero 


Rap1’s affinity for DNA was generally high, whereas Rap] sites with the 
fastest turnover had lower experimentally measured Rap] affinity 
(Fig. 4f). Despite this relationship, among sites with strong Rapl 
motifs, nucleosome occupancy was still the major factor distinguishing 
sites with long Rap] residence times from those with higher turnover 
(Fig. 4f). 

Longer in vivo Rap! residence times at sites of high Rap] affinity are 
consistent with control of the Rapl-nucleosome competition being 
encoded directly in the DNA sequence to a substantial degree’*'®. We 
reasoned that this would be reflected in the sequence of the DNA 
motifs bound by Rap1. Indeed, we found differences in the composi- 
tion of the Rap1 motifs for each of the turnover categories, with the 
longest residence Rap] sites preferentially containing A or T at posi- 
tions 4, 8, 12 and 13 (Fig. 4g-i). These associations were not as strong 
when Rap] targets were ordered by occupancy (Supplementary Fig. 12). 
Sites at which residence was shortest tended to contain a degenerate 
Rap1 binding motif (Fig. 4g). 
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represents no relative change in nucleosome occupancy. f, In vitro Rap] affinity 
for its cognate target, as measured using a PBM™, compared to Rap] residence. 
Colours represent histone H4 occupancy z scores (>—1.5 purple (high),<—1.5 
green (low)'’). g, Top-position weight matrix motifs discovered for Rap1 
targets, grouped by residence. The number of targets for each group is in 
parentheses. h, All motifs from the top-position weight matrix for each 
residence group are coloured by their A or T (A/T) (purple), or G or C (G/C) 
(green) content at each motif base position. i, Percentage of A/T content for the 
entire motif (blue), AA/AT/TA/TT at the twelfth and thirteenth motif position 
(green), TT at the twelfth and thirteenth position (purple) and GG/GC/CG/CC 
at the twelfth and thirteenth position (red) for Rap] targets, grouped by 
residence and telomeric regions. 


For several other transcription factors, microscopy-based measure- 
ments at individual loci point to much shorter residence times than 
those measured for Rap] (refs 1, 3, 4, 25-27). For example, despite an 
in vitro residence time similar to Rap1 (~90 min?”®), glucocorticoid- 
receptor binding seems to be exceptionally short-lived at individual 
loci**’’, Nevertheless, an overall positive relationship between residence 
time and transcriptional output is observed for both Rap1l and 
glucocorticoid receptor*’. The differences in Rap1 and glucocorticoid- 
binding dynamics, and the disparity between glucocorticoid-receptor 
residence time in vitro and in vivo, may reflect different modes of 
interactions with nucleosomes. The binding affinity of glucocorticoid 
receptor may be particularly sensitive to nucleosome packaging or 
may be regulated by the availability of DNA that is transiently 
accessible from the nucleosome surface’*. This type of accessibility 
on the nucleosome itself could be regulated, and would not rely on 
the complete loss ofa nucleosome”. Rap] itself exhibits such properties, 
with its binding progressively inhibited as the motif recognized by 
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Rap1 is moved closer to the nucleosome dyad*’. Our data do not 
exclude a model in which transcription factor binding occurs adjacent 
toa nucleosome, and competition occurs without complete nucleosome 
eviction’*”**°, 

In this study, we determined Rap1-binding dynamics genome-wide 
using competition ChIP. Rap] occupancy was only weakly correlated 
with Rap1 binding turnover, showing that these are independently 
measurable properties. Binding turnover correlates more strongly than 
occupancy with many aspects of genomic function, most predominantly 
RNA polymerase II recruitment and transcript levels. Stable Rap1 bind- 
ing is associated with activation, whereas Rap1 treadmilling is associated 
with higher nucleosome occupancy, nucleosomal treadmilling, and a 
lack of transcription. Our work provides the basis for a model in which 
transcription-factor-binding dynamics is a major point of regulation in 
determining the functional consequences of transcription factor bind- 
ing. Importantly, this model provides a plausible mechanism for a locus- 
specific switch between inactive and active transcription factor states, or 
even for a rapid switch from an activator (stable binding) to a repressor 
(treadmilling), or vice versa. This could be achieved at any given locus 
through a ‘clutch’ that alters the balance of the continual competition 
between transcription factors and nucleosomes (Supplementary Fig. 1). 
This clutch could operate through histone modification, histone-variant 
incorporation, ATP-dependent chromatin remodelling, cofactor bind- 
ing, or any other site-directed chromatin altering activity. 


METHODS SUMMARY 


Strain construction. The RAP1 gene and promoter was cloned into the pRS403 
plasmid and integrated by homologous recombination into the HIS3 locus of the 
BY4741 S. cerevisiae strain. The two copies of RAP1 were then sequentially tagged 
using the 9X Myc epitope from pYM20:hphNT1 at the HIS3 copy of RAP1 and the 
3X Flag tag from p3Flag-KanMxX at the endogenous RAP1 copy. The HIS3 copy of 
the RAP1 promoter was replaced using homologous recombination by amplifying 
the GALL:natNT2 promoter from the pYM-N27 plasmid. Integrations were con- 
firmed using PCR and western blot analysis. The BARI gene was knocked out by 
homologous recombination using a LEU2 gene amplified from pRS405. 

Time course. Yeast were grown overnight in YPD (yeast extract 1%, peptone 2% 
and dextrose 2%) and used to inoculate 800 ml of YPR (yeast extract 1%, peptone 
2% and raffinose 2%) to an attenuance at 600 nm (Déo0 nm) of 0.2 (Genesys 20 
Spectrophotometer) in a 4-1] Erlenmeyer flask. These cells were grown to an Dgoo of 
0.4 and subsequently arrested using 541M alpha factor (400pl of 10mM, 
GenScript) until 95% of the yeast cells were unbudded (~3h). Cells were then 
induced by adding 40% galactose to a final concentration of 2%. At this time, 
additional alpha factor was added (400 pl of 10 mM, GenScript). Samples were 
collected at time points 0, 10, 20, 30, 40, 50, 60, 90, 120 and 150 min after galactose 
induction. At each time point, 35 ml of culture was taken and added immediately 
to 37% formaldehyde to a final concentration of 1% for 20 min. Thirteen millilitres 
were taken for subsequent RNA preparation. Two millilitres were taken for protein 
preparation by pelleting cells and heating at 95 °C for 5 min in 0.06 M Tris-HCl, 
pH6.8, 10% glycerol, 2% SDS, 5% 2-mercaptoethanol and 0.0025% bromophenol 
blue. All samples were frozen immediately in liquid nitrogen. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Strain construction. The RAP1 gene and promoter was cloned into the pRS403 
plasmid and integrated by homologous recombination into the HIS3 locus of the 
BY4741 S. cerevisiae strain. The two copies of RAP1 were then sequentially tagged 
using the 9X Myc epitope from pYM20:hphNT1 at the HIS3 copy of RAP] and the 
3X Flag tag from p3Flag-KanMxX at the endogenous RAPI copy. The HIS3 copy of 
the RAPI promoter was replaced using homologous recombination by amplifying 
the GALL:natNT2 promoter from the pYM-N27 plasmid. Integrations were con- 
firmed using PCR and western blot analysis. The BARI gene was knocked out by 
homologous recombination using a LEU2 gene amplified from pRS405. 

Time course. Yeast strains were grown overnight in YPD (yeast extract 1%, 
peptone 2% and dextrose 2%) and used to inoculate 800 ml of YPR (yeast extract 
1%, peptone 2% and raffinose 2%) to an attenuance at 600 nm (D600 nm) of 0.2 
(Genesys 20 Spectrophotometer) in a 4-] Erlenmeyer flask. These cells were grown 
to an Deoo of 0.4 and subsequently arrested using 5 1M alpha factor (400 ul of 
10 mM, GenScript) until 95% of the yeast cells were unbudded (~3 h). Cells were 
then induced by adding 40% galactose to a final concentration of 2%. At this time, 
additional alpha factor was added (400 pl of 10 mM, GenScript). Samples were 
collected at time points 0, 10, 20, 30, 40, 50, 60, 90, 120 and 150 min after galactose 
induction. At each time point, 35 ml of culture was taken and added immediately 
to 37% formaldehyde to a final concentration of 1% for 20 min. Thirteen millilitres 
were taken for subsequent RNA preparation. Two millilitres were taken for protein 
preparation by pelleting cells and heating at 95 °C for 5 min in 0.06 M Tris-HCl, 
pH6.8, 10% glycerol, 2% SDS, 5% 2-mercaptoethanol and 0.0025% bromophenol 
blue. All samples were frozen immediately in liquid nitrogen. 

Turnover model. A mathematical model is required to interpret the data, and to 
obtain binding turnover rates. We used a modified version of a histone H3 
turnover model®. The original H3 turnover model assumed that there was no 
competitor protein present before its induction’. We were also unable to detect 
the presence of the Rap1 competitor protein before induction using western blot 
analysis. Nevertheless, at each locus we consistently measured a non-zero com- 
petitor signal from the microarray even before the competitor was induced. This 
probably reflects the nonspecific background of our microarrays. Most of the steps 
that could contribute to this noise—for example, non-specific pull down from the 
beads, site-specific variations in the DNA amplification or nonspecific binding 
bias in hybridization—would affect the constitutive and competitor signal equally, 
and therefore we assume for simplicity that the total nonspecific background 
signal is approximately the same for the constitutive signal and for the competitor 
signal in our modified turnover model. We assume that at each binding site the 
measured immunoprecipitation signal (mIP(t)) is the true immunoprecipitation 
signal (IP(t)) plus the background (BGD(t)): 


mIP(t) =IP(t) + BGD(t) (1) 
We assume that at the beginning of the experiment (before induction) the true IP 


signal of the competitor is zero. The background signal at the start of the experi- 
ment is therefore the signal measured for the competitor protein A at time 0: 


mIP, (0) =BGD(0) (2) 


The measured background signal will generally be time-dependent because our 
data showed that the measured raw intensities of the immunoprecipitation signals 
for the constitutive and competitor Rap] proteins fluctuated from one time point 
to the next, even though their relative proportions remained roughly the same. 
This suggests that there are systematic variations in either the ChIP conditions or 
the microarray imaging conditions from one time point to the next, which would 
also probably influence the background signal. 

The systematic changes in either the ChIP or imaging conditions can be quan- 
tified by comparing the total signal of constitutive Rap1 plus competitor Rap1 at 
each binding site as a function of time. We assume that the addition of competitor 
does not change total occupancy® (Supplementary Figs 3 and 4). Thus, at each 
binding site, the ratio of the total signal (constitutive plus competitor Rap1) at time 
t versus time 0 generates a scaling factor to account for systematic fluctuations over 
time. This scaling factor (the part in brackets in equation (3)) can be used to 
calculate the background signal at time t based on the background at time 0: 


IPa(t)+IP,(t) 
mae (3) 


With this formula, we can calculate an occupancy ratio in the presence of back- 
ground signal. First note that the occupancy ratio R(t) in the absence of back- 
ground is defined as the ratio of the immunoprecipitates of the competitor and 
constitutive signals: 


BGD(t) =BGD(0) x ( 


R(t) =IPa(t)/IPa(t) (4) 


We define a measured occupancy ratio mR(t) that includes the background signal: 


mR(t)=mIP,(t)/mIPg(t) = [IP4(t)+ BGD(t)]/[IPa(t)+BGD()] (5) 


where the second equality arises by substitution from equation (1), assuming that 
the background is the same in the competitor and constitutive signals. Using 
equations (2), (3) and (4), equation (5) can be rewritten as: 


mR(t)=[R(t) + Co(1+R(t))]/[1 + Co(1 + R(O))] (6) 


where Cp = mIPq(0)/(IP4(0)+IP,(0)). This constant can be expressed in terms 
of measurable quantities by using equations (1), (2) and the previously stated 
assumption IP,4(0)=0 to yield: 

mR(0) 

a mR(0) (7) 
where mR(0) is the measured occupancy ratio at time 0. In practice, we calculated 
Cy by averaging over the first three time points, and found that they all showed no 
detectable competitor signal. With this estimate of Co, equation (6) enables an 
occupancy ratio to be calculated in the presence of a microarray background signal 
by using the occupancy ratio R, calculated in the absence of background’. 

R(t) is the probability that a locus is occupied by the competitor protein divided 
by the probability that it is occupied by the constitutive protein®. If P is the 
probability that the competitor occupies a given locus, then the probability that 
the constitutive protein occupies the locus is 1— P(t), and so R(t) becomes: 


P(t) 
1— P(t) i) 


R(t)= 


This probability satisfies the following differential equation®: 


d A(t) 
an rs Pt)) (9) 


Here / is the turnover rate at each locus, and A(t) and B(t) are the cellular con- 
centrations of the free competitor and constitutive proteins. We measured A(t) 
and B(t) at all time points using western blot analysis. To determine the turnover 
rate A for each locus we tuned / to fit the measured occupancy ratio mR(t) at that 
locus. Specifically, we varied / in equation (9) such that the value of R(t) obtained 
from equation (8) yields the best fit to our measured occupancy ratio when R(t) is 
substituted into equation (6). 

The modified turnover model (equation (6)) was implemented in Matlab 2009b 
(The MathWorks) and equation (9) was solved numerically using the ODE45 
function. The Matlab routine ‘Isqcurvefit’ was used to fit the models to experi- 
mental data and extract the turnover rate 2. We sampled a range of different 
starting guesses to avoid the detection of local minima. The Matlab source code 
for the modified turnover model is available online (http://code.google.com/p/ 
ccc-process/). 

Plasmids. The following plasmids were used in generation of the Rap1 turnover 
strain: pRS403 (ref. 31), pRS405 (ref. 31), pYM20:hphNT1 (ref. 32), p3FLAG- 
KanMx (ref. 33) and pYM-N27 (ref. 32). 

Chromatin immunoprecipitation and DNA amplification. Chromatin immuno- 
precipitation was performed on whole-cell extract from crosslinked cells as 
described previously, using anti-Flag (M2, Sigma), anti-Myc (clone 9E10, 
Millipore), and anti-Rapl (y-300, Santa Cruz Biotechnology)*. Immuno- 
precipitated and/or input DNA was amplified using the GenomePlex Complete 
Whole-Genome Amplification (WGA) kit (WGA2-50RXN, Sigma) and then re- 
amplified using GenomePlex WGA Reamplification Kit (WGA3-50RXN, Sigma) 
using the manufacturer’s protocols. DNA was purified using Zymo columns 
according to the manufacturer’s instructions (Zymo Research). 

Hybridization and processing of data from high-resolution HD4 microarrays. 
For Nimblegen high-resolution HD4 microarrays, amplified ChIP material was 
sent directly to Nimblegen where it was labelled and hybridized according to 
protocols in chapters 3 and 4 of the NimbleGen Arrays User’s Guide ChIP-chip 
Analysis, Version 3.1, 27 May 2008. Bi-weight mean scaled ratios are used as input 
for lowess normalization. All HD4 array data are deposited in the Gene Expression 
Omnibus (GEO) under accession GSE32351. 

Modified lowess normalization. Standard lowess normalization results in 
depressed binding ratios at the most highly enriched probes in ChIP-chip experi- 
ments. We therefore implemented a modified lowess normalization designed 
specifically for ChIP-chip based on the method described previously’. Our method 
varied from the previously published method’ in that we defined the ‘enriched’ 
group of probes based on the sites that we used to define Rap1 target enrichment 
for our turnover time course. We excluded all probes within +2,000 bp of a Rap1 
binding site, and used all remaining probes as the reference group to calculate the 
lowess function for normalization (Supplementary Fig. 8a—d). Each time point is 
normalized separately but we use the same group of reference probes for the 
normalization. Although we believe that using this modified lowess normalization 
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approach is the most appropriate way to normalize the data, we find qualitative 
and quantitatively similar Rap] turnover values without normalization (data not 
shown). 

Hybridization and processing of data from low-resolution PCR-based arrays. 
One microgram of amplified DNA was labelled with either 2'-deoxyuridine, 
5'-triphosphate (AUTP) Cy5 (PA55022, GE Healthcare) or Cy3 (PA53022, GE 
Healthcare) for low-resolution PCR-based arrays. Purified labelled DNA was 
hybridized to PCR-based arrays representing the whole yeast genome and cover- 
ing all coding and non-coding regions at an average resolution of approximately 
800 bp (ref. 10). The time course was performed in duplicate, one in each dye 
orientation, with the Myc and Flag samples then comparatively hybridized to an 
array for each time point. Arrays were scanned using an Axon 4000B scanner and 
analysed using Genepix 6.0 software (Axon). Only spots with <10% saturated 
input pixels and a signal intensity of greater than 500 (background-corrected sum 
of medians for both channels) were used for the analysis. Data were further 
normalized in the UNC microarray database with the normalized median log, 
ratio of Rap1-Myc/Rap1-Flag being used for further analysis. All low-resolution 
array data are deposited in GEO under accession GSE27377. We did not use these 
ChIP-chip data in any of our analyses except in Supplementary Fig. 3. 

Reverse transcription, complementary-DNA labelling and expression arrays. 
Total RNA was extracted by the hot phenol method as previously described’. 
Total RNA (30 jug) was reverse transcribed into cDNA using reagents and protocols 
provided with SuperScript II reverse transcriptase (Invitrogen; Cat. No. 18064-014) 
containing an amino-allyl-dDUTP mix (50 aa-dUTP mixture; 1 mg amino-allyl 
dUTP (Sigma)) dissolved with 32 pl of 100-mM dATP, dGTP, dCTP, 12.7 pl of 
100-mM dTTP, and 19.3 pl of dH;O, and an anchored oligo dT primer (22-mer; 
IDT). Reactions were incubated for 2 h at 42 °C, then heated at 95 °C for 5 min and 
snap cooled on ice. RNA was hydrolysed by addition of 13 pl of 1-N NaOH and 1 pl 
of 0.5-M EDTA followed by incubation at 67 °C. Reactions were then neutralized 
with 50 ul of 1 M HEPES pH7.5. cDNA was purified on Zymo columns (Zymo 
Research; D4003) using a seven-volume excess of DNA binding buffer. cDNA was 
eluted from columns using 5 tl of 50-mM sodium bicarbonate pH 9.0. cDNA was 
fluorescently labelled using Amersham CyDye Post-Labelling Reactive Dye Packs 
(RPN5661). Each dye pack was resuspended in 11 ppl] DMSO and 3 ul of mixture 
was used per reaction. Cy dyes and aa-dUTP cDNAs were allowed to couple for 2 h 
in the dark. Labelled cDNAs were cleaned up using Zymo columns with a seven- 
volume excess of DNA binding buffer and eluted with 10 mM Tris-Cl pH 8.0 and 
hybridized to arrays as described previously. 

For comparative hybridization, input genomic DNA from the experimental 
Rap! turnover strain was extracted using phenol chloroform. Four micrograms 
of genomic DNA was denatured at 100 °C with 10 ug of random hexamer (IDT) 
then snap cooled on ice for 10 min. Samples were then incubated with 50 units of 
Klenow (exo-) (New England Biolabs (NEB)) and 1X Buffer 2 (NEB) in a total 
volume of 50 pl at 37 °C for 2h. Samples were cleaned up with Zymo columns, 
eluted in 5 pil of 5-mM sodium bicarbonate pH 9.0 and coupled to Cy dyes as for 
cDNA. Expression studies were performed on PCR-based arrays that were pre- 
pared, processed and analysed as for the low-resolution ChIP arrays’®. 

Defining regions of Rap1 enrichment. Rap1 ChIP-seq data from yeast strain 
BY4741 grown in YPD (yeast extract 1%, peptone 2% and dextrose 2%) were used 
to determine precise sites of Rap] binding. Peaks and peak summits were iden- 
tified using model-based analysis for ChIP-Seq (MACS) with a bandwidth of 300 
anda P-value cutoff of 1 X 10°. Peaks in our turnover data set were identified on 
total Rap1 occupancy at time 0 using Peakpicker** to ensure that we identified all 
Rap] peaks that were present in our turnover conditions. For analysis, we then 
used only MACS Chip-seq peak regions that had at least 1 bp of overlap with our 
time-course peaks, and a z score of >1.5 at time 0. Seven regions with a z score of 
>1.5 at time 0 that were identified at time 0 of the Rap1 time course but not of the 
ChIP-seq experiment were also included to ensure full representation of Rap1- 
enriched regions in our experiment. Of the 457 total Rap] peak regions identified, 
18 were not analysed. Fifteen targets had an estimated residence time of under 
500 s, which is too short to measure with our system (Supplementary Fig. 6). Three 
targets that had residence times that exceeded 1 X 10*s and showed exceptionally 
poor fits to the model were also excluded. The average log, Myc/Flag level for all 
probes which fell within +150 bp of peak summits were averaged to generate a 
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Myc/Flag value for each time point for each target. On average, eight probes 
contributed to the Myc/Flag signal for Rap1 targets. Peak summits were used to 
assign target regions to promoters or coding regions for further analysis. 
Telomeric regions were tiled using only uniquely mapping probes, making 
signal discontinuous in these regions and making peak calling difficult. For this 
reason, telomeres were defined by annotations from the Saccharomyces genome 
database (http://www.yeastgenome.org/). We excluded telomeres from any ana- 
lysis that relied on our turnover metric because they contain many arrayed Rap] 
binding sites within their AC-rich repeats. In theory, as the number of Rap1- 
binding sites detected by an individual microarray probe increases, the probability 
that either isoform of Rap] will be detected at that probe increases. This violates 
some assumptions of our turnover metric, which would theoretically lead to 
artificially short residence-time estimates. Despite this, empirically we see no 
relation between Rap1 residence times and motif number or density (Sup- 
plementary Fig. 10). 
Motif discovery. The 439 Rap1-bound target regions (excluding telomeres) were 
placed into four categories based on their turnover properties: longest (110 targets), 
long (110 targets), short (110 targets) and shortest (109 targets). The DNA sequences 
for each Rap] target region in each group were then used as input for the web-based 
interface for BioProspector® (http://ai.stanford.edu/~xsliu/BioProspector/). Default 
parameters were with the exception of the width of the first motif block, which was 
changed to ‘13’ and ‘S. cerevisiae intergenic’ was used as a genome-background 
model. Rap1’s telomeric motif was determined from the full telomeric sequences 
of the 26 telomeres that were uniquely mappable on our arrays. Weblogo” (http:// 
weblogo.berkeley.edu/logo.cgi) was used to generate a visual representation of the 
position-weight matrix output from Bioprospector. The 439 Rap1 targets were 
similarly grouped by their occupancy properties to determine Rap1 motifs for 
Rap] targets grouped by occupancy. The default settings on the motif scanning 
program Clover** were used to detect Rap1 motifs genome-wide using a previously 
published Rap] position-weight matrix"®. 
External data sets. Values from existing data sets with a one-to-one correspond- 
ence to the arrayed elements in our study were used as published. For data sets 
derived from arrays that did not match our probe set, log, ratios and z scores were 
calculated for each array probe, for each replicate of the external data set. Z scores 
were defined as the number of standard deviations that a probe’s log, ratio was 
from the mean log; ratio of all probes on the array. In cases with several replicates, 
average z scores were used to represent each probe. To map the data back to our 
experiments, the average z scores of the array probes for the specific data set that 
were contained within the promoter or coding region assigned to each Rap] target 
were used for comparison. For histone H3 turnover data, the highest value for a 
probe that fell within promoters associated with peak summits for target regions 
was used for our analysis®. For Rap] nucleosome interaction data we summed all 
the detected interactions that fell within each Rap] target region. 
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During protein synthesis, the ribosome accurately selects transfer 

RNAs (tRNAs) in accordance with the messenger RNA (mRNA) @ 
triplet in the decoding centre. tRNA selection is initiated by 
elongation factor Tu, which delivers tRNA to the aminoacyl 
tRNA-binding site (A site) and hydrolyses GTP upon establishing 
codon-anticodon interactions in the decoding centre’. At the 
following proofreading step the ribosome re-examines the tRNA 
and rejects it if it does not match the A codon”*’*™. It was sug- 
gested that universally conserved G530, A1492 and A1493 of 16S 
ribosomal RNA, critical for tRNA binding in the A site’*""’, actively 
monitor cognate tRNA", and that recognition of the correct 
codon-anticodon duplex induces an overall ribosome conforma- 


P/A-kink 
me > 


U4) 


tional change (domain closure)’’. Here we propose an integrated . First d First 
mechanism for decoding based on six X-ray structures of the 70S Near-cognate Cognate 

ribosome determined at 3.1-3.4 A resolution, modelling cognate mRNA/Phe —5’-UUU- 3° mRNA/Leu_ 5’-CUC- 3° 
or near-cognate states of the decoding centre at the proofreading tRNA! —_3’-GAG-5’ tRNA," __3’-GAG-5’ 


step. We show that the 30S subunit undergoes an identical domain 
closure upon binding of either cognate or near-cognate tRNA. This 
conformational change of the 30S subunit forms a decoding centre 
that constrains the mRNA in such a way that the first two nucleo- 
tides of the A codon are limited to form Watson-Crick base pairs. 
When UeG and GeU mismatches, generally considered to form 
wobble base pairs, are at the first or second codon-anticodon posi- 
tion, the decoding centre forces this pair to adopt the geometry 


Second 
close to that ofa canonical CeG pair. This by itself, or with distor- @ Near-cognate Y £ \Coansts 
tions in the codon-anticodon mini-helix and the anticodon loop, mRNA/Cys 5’-UGC-3’ mRNA/Tyr_5’-UAC-3" 
causes the near-cognate tRNA to dissociate from the ribosome. tRNAD" 3’-AUG-5’ tRNAD" 3’-AUG-5’ 


We determined six X-ray structures of the 70S ribosome at 3.1-3.4 A 

resolution (Supplementary Tables 1 and 2) programmed by 30- G530 A1492 ro. Ale 
nucleotide-long mRNAs with the AUG codon and tRNA™* in the = Xe 
peptidyl tRNA-binding site (P site) and the A site occupied by F a os 
tRNA,“ or tRNAD™ (Fig. la and Methods). In one set of experiments, 
tRNA," and tRNA" were bound to their respective cognate codons | 


CUC and UAC in the A site. In a second set of experiments, we 7 U35 G(+5) , 

modelled near-cognate states of the ribosome (Supplementary Fig. 1). aa 

These states of the ribosome naturally occur during protein synthesis g _Near-cognate Y hh Cognate 

but with low probability because binding of cognate tRNA is kinetically mRNA/Phe 5’-UUU-23’ mRNA/Leu 5’-CUC-3’ 
tRNA,-&Y —3-GAG-5’ tRNA,“ -3’-GAG-5’ 

Figure 1 | Codon-anticodon interactions in the decoding centre on the 70S $12 

ribosome. a, The mRNA path on the 70S ribosome with the decoding (DC) area G530 C518 C51 


G530 
indicated. b, Close-up view of the mRNA P/A kink with near-cognate tRNA,/. $12 


Magnesium ions are in green. c, d, The first base pairs of the near-cognate (c) and 
cognate (d) codon-anticodon duplexes and their interactions with A1493 of 16S 
rRNA. e, f, The second base pairs of the near-cognate (e) and cognate (f) codon- 
anticodon duplexes and their interactions with G530 and A1492 of 16S rRNA. 
g, h, A classical wobble UeG pair (g) versus canonical CeG interactions (h); a 
magnesium ion interacting with the base pair is coordinated by protein $12 and 
part of 16S rRNA. All graphical representations were prepared with PyMol. 
Probable hydrogen bonds within 3 A distance are indicated by dashed lines; 
2F, — F, electron density maps are contoured at 1.2 sigma. 
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favoured. We made the ribosome accept near-cognate tRNA by giving it 
only one type of tRNA carrying a mismatch to the A codon along with 
tRNA™*' for the P site. In these complexes the A site was filled either by 
tRNA,'*" and codon UUU with a UsG mismatch in the first position of 
the codon-anticodon helix or by tRNA’ and codon UGC with a GeU 
mismatch in the second position. 

As shown earlier, the mRNA forms a kink between the P and A 
codons (P/A kink), a universal feature of the mRNA path on the 70S 
ribosome that is stabilized by the P site tRNA, 16S ribosomal RNA 
(rRNA) and magnesium ions’”°”' (Fig. 1b and Supplementary Fig. 2). 
The single mismatch states described above represent bona fide near- 
cognate complexes expected to have standard wobble UeG and GeU 
base pairs. At our data resolution (3.1-3.5 A) we can confidently assign 
the general base pairing (Supplementary Figs 3 and 4). The electron 
density maps unambiguously demonstrate that U4 and G5 of the A 
codons UUU and UGC do not show the anticipated wobble inter- 
actions with G36 in tRNA,“ and U35 in tRNA™™, respectively. 
Instead, U4*G36 and G5eU35 at the first and second positions of 


a Cognate versus first mismatch 

A1493 

A1492 
es Cognate 
Near-cognate 3 A-codon 
c 3 
A-codon 
P/A-kink 


{70S / long mRNA 
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the codon-anticodon duplexes form base pairs similar to a standard 
Watson-Crick GeC pair (Fig. 1c, e and Supplementary Fig. 5). GeC- 
like GeU or GeT pairs have been shown earlier for RNA and DNA in 
other X-ray structures”. When UsG is at the third codon-anticodon 
position we observe standard UeG wobble pairing (Fig. 1g, h). 
Unexpectedly, nucleotides A1493, A1492 and G530 of 16S rRNA in 
helix 44 (h44), which contact the first and the second pairs of the 
codon-anticodon helix, interact with these unusual U4*G36 and 
G5eU35 pairs identically to the way they interact with canonical 
Watson-Crick base pairs C4¢G36 and A5eU35 (Fig. 1c-f). These find- 
ings are in contradiction with studies where these nucleotides were 
given a role as monitors and discriminators of canonical Watson- 
Crick pairs in the decoding process’*”’. Our structures show that 
G530, A1492 and A1493 form a static part of the decoding centre, 
defining its spatial and stereochemical properties (Fig. 2a, b). 

The observed non-wobble UeG differs from previous X-ray studies 
that were based on the 30S subunit alone” (Fig. 2c and Supplementary 
Figs 5 and 6). For example, for the study of the mismatch at the first 
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Figure 2 | The nature of the decoding centre. a, b, The overall conformations 
of universally conserved G530, A1492 and A1493 of 16S rRNA in the cognate 
structures are identical to those in the near-cognate models when the 
mismatches are at the first (a) or second (b) codon-anticodon positions. 

c, Differences between the position of the first uridine in the UUU codon base- 
paired to the GAG anticodon of tRNA,'*" from our 70S structure and from the 
30S model!”. d, Comparison of the anticodon loops of tRNA™ in the cognate 


(red) and near-cognate (cyan) states. e, Rearrangements of rRNA helices h44 
and H69 in the near-cognate state upon binding of the aminoglycoside 
paromomycin (PAR). The near-cognate structures with tRNA” are shown (a 
similar effect of PAR is observed with tRNA,"*", see also Supplementary Fig. 
10). f, Magnified view of the changes in the A1493 phosphate position induced 
by PAR. Superimpositions in d, e and f were performed using 23S rRNA as 
reference. 
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position, 30S crystals soaked with an anticodon stem-loop of tRNA," 
and a hexauridine (Us) mRNA displayed a classical wobble UeG pair. 
There the P codon was mimicked by the 3’-end of 16S rRNA, so the Ug 
mRNA could only bind to the A site and downstream, leading to a 
situation where the mRNA was not covalently linked between the P 
and A codons. By superimposing the A site GAG anticodons from our 
near-cognate tRNA," structure and the 30S model we found that the 
first nucleotide of the A codon is positioned differently (Fig. 2c and 
Supplementary Fig. 6). Because it does not have the natural restraint 
coming from being covalently bound to the P codon, this first nucleo- 
tide in the A codon has the freedom to move so it can form a wobble 
UeG pair. However, in our structure the P/A mRNA kink specifically 
directs the first nucleotide of the A codon to form Watson-Crick like 
interactions with G36 of tRNA," (Fig. 1c). 

The positional restrictions imposed by the decoding centre on the 
first two near-cognate UeG codon-anticodon pairs may result in dif- 
ferences in geometry compared with the corresponding cognate 
helices (Supplementary Figs 7 and 8). Although, the resolution of 
the data sets does not allow us to determine the exact value for these 
deviations in the base pairs, the tendencies are clear. A noticeable 
change is an increase in buckling at the first and third codon-anticodon 
positions with the U4*G36 mismatch (Supplementary Fig. 7) and at the 
second and third positions with the G5*U35 mismatch (Supplemen- 
tary Fig. 8). These deviations could disturb the base stacking network 
within the near-cognate codon-anticodon helices and deform the 
entire anticodon loop structure™ (Fig. 2d, Supplementary Fig. 9 and 
Supplementary Movie 1). This deformation might influence the 
position of helix 69 (H69) of 23S rRNA, whose universally conserved 
A1913 protrudes into the decoding centre” (Fig. 2e). 

Additional structures of the near-cognate states determined in the 
presence of the miscoding aminoglycoside paromomycin (Supplemen- 
tary Tables 1 and 2) reveal a movement of H69 accompanied by 


rearrangements of the intersubunit bridge B2a between h44 and H69 
(Fig. 2e and Supplementary Movie 2). These distortions are most 
probably caused by binding of the antibiotic that strongly shifts the 
A1493 phosphate group (Fig. 2f). Although this shift does not alter 
much the interactions of A1493 with U4*G36 and A1492/G530 with 
G5eU35 (Supplementary Figs 7g and 8h), these local changes modulate 
the B2a bridge. H69 is displaced towards the tRNA, which probably 
enhances the interaction surface of H69 with the tRNA D-stem. In the 
presence of paromomycin the position of H69 is closer to that observed 
for the cognate state (Supplementary Fig. 10). Furthermore, displace- 
ment of the A1493 phosphate group relaxes the decoding pocket from 
the side of the A codon (Fig. 2e) and changes the deformation of the 
near-cognate codon-anticodon helix (Supplementary Movie 2). This 
novel understanding of the paromomycin action therefore differs from 
the previously suggested mechanism in decoding where it was pro- 
posed to influence the monitoring capabilities of A1492 and A1493 
(ref. 18). The observed moderate structural rearrangements with 
paromomycin are consistent with its measured effect at the proofread- 
ing step”. 

We find that both near-cognate tRNAs induce rearrangements of 
the 30S subunit known as domain closure’? (that is, shoulder move- 
ment and head rotation) as described for cognate tRNA” (Fig. 3a). 
This implies that domain closure is an inherent quality of the ribosome 
in response to binding of any tRNA to the A site’ and is prerequisite 
for formation of the decoding pocket. 

Initially, the mechanism underlying the decoding process was 
deduced from pioneer X-ray structures of the isolated 30S subunit 
where crystals were soaked with U, and anticodon stem-loop mimick- 
ing mRNA and tRNA”. Besides those limitations, all attempts to 
model near-cognate states on the 30S subunit were performed in the 
presence of paromomycin, which, on the one hand, stimulated an 
ordered binding of the near-cognate tRNA analogs, but, on the other, 
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Figure 3 | The principle of decoding. a, Superposition of 23S rRNA from 
near-cognate and cognate structures with tRNA," shows identical domain 
closure in the near-cognate and cognate states. b, Representation of 
conformations of the core nucleotides composing the decoding centre 
depending on its functional states (data with the vacant 70S ribosome are 
unpublished). c, d, Illustration of the decoding principle: together with the 
constraints imposed on the A codon by the P/A kink coordinated by a 
magnesium ion (green sphere), the DC (h18, h44 and protein $12 from the 
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small subunit and H69 from the large subunit) restricts the allowed geometry of 
the first two nucleotides of the codon. No such restraints are imposed on the 
third base pair. A near-cognate tRNA with GeU in the first or second position is 
forced to form Watson-Crick-like base pairs (middle panels). This creates 
repulsion or requires energy for tautomerization (shown in pink), which by 
itself can be the source of the tRNA discrimination. The right panels illustrate 
the impossible situation when standard wobble base pairs (shown in red) occur 
in either the first or second positions of the codon-anticodon duplexes. 
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made it difficult to distinguish the independent effect of the mismatch 
on the 30S decoding centre’’. The cognate and near-cognate models of 
the 70S ribosome described here, with or without paromomycin, 
together with our previous structures’*”', give rise to novel insights 
into decoding with revisited roles for universally conserved nucleotides 
G530, A1492 and A1493 of 16S rRNA (Fig. 3b). Although these 
nucleotides form extensive contacts to the A-minor groove of a 
codon-anticodon helix, G530, A1492 and A1493 do not actively sense 
the correct Watson-Crick base-pairing geometry and thus do not 
discriminate against near-cognate tRNA. 

We propose that upon binding of cognate or near-cognate tRNA to 
the 70S ribosome, the small subunit undergoes domain closure around 
the anticodon loop of the tRNA. The closure results in formation of a 
tight decoding centre that restricts the first two nucleotides of the A 
codon to form exclusively Watson-Crick base pairs with the tRNA 
anticodon (Fig. 3c, d). Owing to our current data resolution, we cannot 
precisely identify the hydrogen bond pattern of the mismatches in the 
near-cognate codon-anticodon helices, but tautomerism is a plausible 
chemical mechanism. An alternative explanation for the tRNA dis- 
crimination source could be repulsion in the UeG pair (Fig. Ic, e). 
Energy expenditure for formation of tautomers (or repulsion energy) 
could constitute the sole cause for the very efficient rejection of near- 
cognate tRNAs by the ribosome**. Additionally, the observed 
deformation of the anticodon may lead to alterations in the codon- 
anticodon mini-helix and propagate through the rest of the near- 
cognate tRNA molecule, destabilizing it and causing the tRNA to 
dissociate from the ribosome (Supplementary Fig. 11). This corrobo- 
rates the idea that evolutionarily tuned sequences of tRNAs play an 
active role in the tRNA selection process**”*. Recent X-ray structures 
of the 70S ribosome with cognate tRNA and elongation factor Tu’7® 
demonstrated that binding of an anticodon loop of tRNA in the decod- 
ing centre is nearly identical to that shown for accommodated tRNA. 
This information prompts us to hypothesize that the same mechanism 
described here for the proofreading step governs the initial tRNA 
selection step. 


METHODS SUMMARY 


Ribosomes were purified from Thermus thermophilus cells as described before”'. 
For all complexes, mRNA, tRNA™* and tRNA,!*" (or tRNA”) were present in 
fivefold stoichiometric excess of the ribosome concentration. Complexes with 
paromomycin were obtained by including the antibiotic (30 11M) into the incuba- 
tion mixture. Crystals were grown at 24°C by sitting-drop vapour diffusion as 
described before”’. All crystals belonged to space group P2,2,2, and contained two 
molecules per asymmetric unit. A very low dose mode with high redundancy was 
used for data collection”’. The structure”!, with tRNAs, mRNA and metal ions 
removed, was used for refinement with Phenix*®. Throughout refinement, no base- 
pair restraints were used between tRNAs and mRNAs to avoid bias towards 
standard base-pair geometry. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Complex formation and crystallization. Ribosomes were purified from Thermus 
thermophilus cells as described before*!. The 30-nucleotide-long mRNA con- 
structs I-IV (see below) were purchased from Dharmacon. In all sequences the 
AUG start codon is underlined and the A codons are in bold. Purified native 
uncharged Escherichia coli tRNA™*, tRNA,'*" and tRNA’ were supplied by 
Chemical Block. Aminoglycoside antibiotic paromomycin was purchased from 
Sigma-Aldrich. 

The cognate and near-cognate complexes were formed in 10 mM tris-acetate, 
pH 7.0, 40 mM KCl, 7.5 mM Mg(Ac)2, 0.5 mM DTT by incubating 70S ribosomes 
(3 uM) with mRNA-I, -II, -III or -IV and tRNA™* for 10 min at 37°C. Then 
tRNA," and tRNA? were added to the mixtures with mRNA-I or -II and 
mRNA-III or -IV, respectively, and the complexes were further incubated for 
30 min. For all complexes mRNA, tRNA™* and tRNA,“ (or tRNA?) were 
present in fivefold stoichiometric excess of the ribosome concentration. 
Complexes with paromomycin were obtained by including the antibiotic 
(30 LM) into the incubation mixture containing mRNA-II with tRNA, and 
mRNA-IV with tRNA™. Crystals were grown at 24°C by sitting-drop vapour 
diffusion as described before”®’. In agreement with previous results, initiator 
tRNA™* was found in the P site of all complexes and either tRNA," or 
tRNA‘ was found in the A site (tRNA™*' was easily distinguishable from 
tRNA,‘*" and tRNA?” based on the large variable loops in those tRNAs). In 
the complexes with mRNA III and IV, the E site was occupied by tRNA™. 
However, in complexes with mRNA I and II, the quality of the density did not 
allow identification of the E site tRNA which was then modelled as tRNA™*, 
mRNA-I 5’-GGCAAGGAGGU(U)4AUGCUC(U)»-3' (cognate for tRNA,'*"). 
mRNA-II5'-GGCAAGGAGGU(U),AUGUUU(U) -3’ (near-cognate for tRNA‘“"). 
mRNA-III 5’-GGCAAGGAGGU(A),AUGUAC(A)9-3’ (cognate for tRNA™). 
mRNA-IV 5’-GGCAAGGAGGU(A),AUGUGC(A)o-3’ (near-cognate for tRNA™). 
Data collection, processing and structure determination. All crystals belong to 
space group P2)2,2, and contain two molecules per asymmetric unit. Data on all 
six complexes were collected at 100K at the Synchrotron Light Source, 


Switzerland, using the Pilatus 6M detector. A very low dose mode was used and 
huge redundancy was collected”. The structure”’, with tRNAs, mRNA and metal 
ions removed, was used for refinement with Phenix*®. The initial model was 
correctly placed within each data set by rigid body refinement with each molecule 
as a rigid body. This was followed by rigid body refinement of individual subunits. 
After positional and B-factor refinement, the resulting electron density maps were 
inspected and the tRNAs and mRNA ligands were built in these unbiased maps. In 
all of these and the following refinement rounds, no base-pair restraints were used 
between tRNAs and mRNAs to avoid bias towards perfect base-pair geometry. 
After several cycles of manual rebuilding followed by positional and individual 
isotropic B-factor refinement, magnesium ions were added and a final refinement 
round took place. A summary of the crystallographic data and refinement statistics 
is given in Supplementary Tables 1 and 2. Supplementary Table 3 shows the 
average B-factor for the entire structure as well as average B-factors for the decod- 
ing centre and the codon-anticodon helix. From this table it is seen that the 
average B-factors for the substructure comprising the decoding centre (G530, 
A1492, A1493 from 16S and G1913 from 23S), as well as the codon-anticodon 
helices (nucleotides 34-36 of tRNA in the A site and the corresponding codon of 
mRNA), are less than the overall B-factor for the entire ribosome structure. 
Therefore it is clear that the decoding centre is part of the most accurately deter- 
mined parts of these models. 

To verify that the base-pair geometry described in the paper is correct, we 
performed a many extra independent refinement rounds with base-pair geometries 
restrained to various standards (Watson—-Crick, wobble, etc.) so that we could be 
confident about the reported geometries. Supplementary Figs 3 and 4 show OMIT- 
averaged kick maps***! of the GeU mismatch for the two near-cognate complexes. 
These unbiased maps show that a wobble conformation of these base pairs would 
not fit into the electron density and clearly demonstrate that a Watson-Crick 
conformation is the only plausible fit. 


31. Praaenikar, J., Afonine, P. V., Guncar, G., Adams, P. D. & Turk, D. Averaged kick 
maps: less noise, more signal and probably less bias. Acta Crystallogr. D 65, 
921-931 (2009). 
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Telomerase RNA biogenesis involves sequential 
binding by Sm and Lsm complexes 


Wen Tang, Ram Kannan!*?, Marco Blanchette”* & Peter Baumann”? 


In most eukaryotes, the progressive loss of chromosome-terminal 
DNA sequences is counteracted by the enzyme telomerase, a 
reverse transcriptase that uses part of an RNA subunit as template 
to synthesize telomeric repeats. Many cancer cells express high 
telomerase activity, and mutations in telomerase subunits are 
associated with degenerative syndromes including dyskeratosis 
congenita and aplastic anaemia. The therapeutic value of altering 
telomerase activity thus provides ample impetus to study the bio- 
genesis and regulation of this enzyme in human cells and model 
systems. We have previously identified a precursor of the fission 
yeast telomerase RNA subunit (TER1)' and demonstrated that the 
mature 3’-end is generated by the spliceosome in a single cleavage 
reaction akin to the first step of splicing”. Directly upstream and 
partly overlapping with the spliceosomal cleavage site is a putative 
binding site for Sm proteins. Sm and like-Sm (LSm) proteins 
belong to an ancient family of RNA-binding proteins represented 
in all three domains of life’. Members of this family form ring 
complexes on specific sets of target RNAs and have critical roles 
in their biogenesis, function and turnover. Here we demonstrate 
that the canonical Sm ring and the Lsm2-8 complex sequentially 
associate with fission yeast TER1. The Sm ring binds to the TER1 
precursor, stimulates spliceosomal cleavage and promotes the 
hypermethylation of the 5’-cap by Tgs1. Sm proteins are then 
replaced by the Lsm2-8 complex, which promotes the association 
with the catalytic subunit and protects the mature 3’-end of TER1 
from exonucleolytic degradation. Our findings define the sequence 
of events that occur during telomerase biogenesis and characterize 
roles for Sm and Lsm complexes as well as for the methylase Tgs1. 

In eukaryotes, seven Sm proteins (SmB, SmD1, 2 and 3, SmE, SmF 
and SmG) form a heteroheptameric complex at U-rich Sm-binding 
sites (AU4_6GR) of various small nuclear RNAs (snRNAs) including 
the spliceosomal snRNAs U1, U2, U4 and U5 (refs 4, 5). Assembly of 
Sm proteins in vivo requires the help of the survival of motor neuron 
protein (SMN), mutations in which result in spinal muscular atrophy’. 
At least two Sm-like complexes have been characterized. The Lsm1-7 
complex functions in messenger RNA (mRNA) degradation”® and the 
Lsm2-8 complex is involved in the maturation of various polymerase 
III transcripts”’ and ribosomal RNAs”. Purified Lsm2-8 binds to the 
3'-terminal U-tract on U6, but not to the internal U-rich Sm sites in 
U1, U2, U4 and U5 snRNAs, illustrating that Sm and Lsm complexes 
have different sets of target RNAs’. 

Sm-binding sites are also found near the 3'-ends of telomerase RNA 
subunits from diverse yeasts’'*""° and are important for RNA proces- 
sing and/or stability'*’°. Actual binding of Sm proteins has been 
demonstrated for TLC1, the telomerase RNA from Saccharomyces 
cerevisiae’’, but the functional consequences of this interaction have 
remained largely unexplored. The Sm-binding site in TLC1 is located 
several nucleotides upstream of the mature 3’-end’*. In contrast, 
spliceosomal cleavage of Schizosaccharomyces pombe TERI truncates 
the putative Sm-binding site by one nucleotide’, which may compromise 
stable association of the Sm ring with mature TERI. We therefore set 


out to examine which proteins bind to the 3’-end of mature TERI, and 
to determine the function of the putative Sm site for TER1 biogenesis 
and stability. 

A strategy was developed to examine the 3’-end of TERI by 
massively parallel sequencing to obtain a quantitative measure of 
3'-end sequence distribution and to identify the most abundant ter- 
minal sequences (Fig. 1a). This analysis revealed that, after spliceosomal 
cleavage, over 60% of TERI molecules lost additional nucleotides at the 
3'-end and terminate in a stretch of three to six uridines (Fig. 1b). The 
3'-end of most of TER1 therefore resembles the 3’-end of U6 snRNA, 
which is bound by the Lsm2-8 complex. To test whether Sm or Lsm 
proteins associate with TERI, carboxy-terminal c-Myc epitope tags 
were inserted at the genomic loci of all Sm and Lsm proteins. 

Immunoprecipitations were performed with a subset of strains that 
did not show overt growth defects, expressed tagged proteins and 
maintained telomeres (Supplementary Fig. 1). The snRNA U1 control 
specifically co-precipitated with Sm proteins, confirming that the 
epitope tags did not interfere with immunoprecipitation of RNP 
complexes (Fig. 1c). TER1 co-immunoprecipitated with all four Sm 
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Figure 1 | TER1 RNA associates with Sm and Lsm proteins. a, Method used 
to map the 3’-end distribution of TERI post spliceosomal cleavage. RNA is 
depicted in orange, DNA in blue. PAP, poly(A) polymerase; RT, reverse 
transcription; PCR, polymerase chain reaction; bc, barcode. b, Distribution of 
3'-end positions in mature TERI from wild-type cells. The average of four 
experiments is shown; error bars, standard deviation; a total of 23 x 10° 
sequences were scored. c, Northern blot of RNA isolated from 
immunoprecipitations with anti-c-Myc antibodies. Input and 
immunoprecipitation (IP) supernatant (s/n) represent 10% of the sample. An 
asterisk marks the position of the TER1 precursor. The lower band corresponds 
to the mature form of TER1. d, Telomerase activity assay performed on beads 
after c-Myc immunoprecipitation of tagged proteins as indicated above each 
lane. Activity was quantified relative to the Trt] immunoprecipitate sample. A 
100-mer [*”P]oligonucleotide was used as recovery and loading control (LC). 
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proteins tested (Fig. 1c, lanes 2-4, and Supplementary Fig. 2a), includ- 
ing members of each of the three Sm subcomplexes*. Strikingly though, 
several-fold more TER1 was recovered from Lsm immunoprecipitates 
resulting in an approximately 80% depletion of TER1 from the immuno- 
precipitation supernatant (Fig. 1c, lanes 5-7). TERI precipitated with all 
subunits of the Lsm2-8 complex (Fig. 1c and Supplementary Fig. 2b), 
but not with Lsm1 (Fig. 1c, lane 8), the subunit specific to the Lsm1-7 
complex. 

To determine whether Sm and/or Lsm are associated with active 
telomerase, direct in vitro activity assays were performed on immuno- 
precipitates. Telomerase activity was detected in all samples, but was 
20-fold higher in Lsm3 and 4 than Smb1 and Smel immunoprecipitates 
(Fig. 1d and Supplementary Fig. 2c). In part this can be explained by the 
lower recovery of telomerase with Sm proteins, as judged by quantifica- 
tion of telomerase RNA on northern blots (Supplementary Fig. 2c, d). 
However, even after normalization to the amount of TERI in each 
immunoprecipitate, Lsm-associated telomerase activity was still 2.8- 
fold higher than that associated with Sm proteins. The simplest 
explanation for this observation is that a fraction of Sm-associated 
TERI is not yet associated with the catalytic subunit of telomerase. 
Indeed, further experiments confirmed that Sm binding precedes 
Trtl binding to TERI. 

To gain insights into the functions of Sm and Lsm binding to 
telomerase we initially focused on the Sm association. For most char- 
acterized snRNAs, sequences downstream of the Sm-binding site are 
critical for Sm loading’’. As the mature form of TERI lacks such 
sequences, we tested whether the Sm complex was loaded onto the 
TERI precursor before spliceosomal cleavage. Reverse transcription 
PCR (RT-PCR) confirmed that the precursor is indeed specifically 
associated with the Sm complex, but is undetectable in Lsm immuno- 
precipitations (Fig. 2a). 

As the spliceosome contains Sm complexes, the TER1-Sm inter- 
action may reflect binding of the spliceosome to the TER] precursor. 
To test whether Sm proteins bind TER1 directly, we generated con- 
structs with either a mutant 5’-splice-site or a deletion of the intron. 
Both mutant RNAs co-immunoprecipitated with Smb1 (Fig. 2b). In 
contrast, replacing the Sm-binding sequence with a random sequence 
(terl-sm6 mutant) reduced Sm association by 22-fold (Fig. 2c). 
Similarly, Lsm association was undetectable for terl-sm6 (Sup- 
plementary Fig. 3a). We therefore surmised that Sm and Lsm proteins 
directly bind to the previously identified site in TERI. 

We next examined the effect of Sm binding on 3’-end processing by 
the spliceosome. Loss of Sm binding in the terl-sm6 mutant resulted 
in a sevenfold reduction in the processed form (Fig. 2d). Furthermore, 
a series of deletion mutants within the Sm site caused progressive 
inhibition of TER1 cleavage (Supplementary Fig. 3b), but not TER1 
splicing (Supplementary Fig. 3c). Finally, introducing an eight- 
nucleotide spacer between the Sm site and 5’-cleavage-site also 
impaired processing (Fig. 2e). In summary, weakening or abolishing 
Sm association with the TER1 precursor reduces spliceosomal cleavage, 
indicating that Sm proteins promote 3’-end processing of TERI. 

A conserved feature among yeast and mammalian telomerase RNAs 
is the post-transcriptional hypermethylation of the 5’-cap into a 2,2,7- 
trimethyl guanosine (TMG) form'’*'*. Sm proteins were first impli- 
cated in promoting cap hypermethylation on U2 snRNA in Xenopus 
extract'’. It was later shown in vitro that TMG-capping of human U1 
requires the presence of SmB/B’-SmD3 (refs 4, 20). A screen for physical 
association with Sm proteins led to the identification of the methylase 
Tgs1 in budding yeast*’. To elucidate the roles of Sm and/or Lsm in the 
hypermethylation of the 5'-cap on TERI, we tested which, if any, of 
these proteins interact with S. pombe Tgs1 (ref. 22) by two-hybrid 
analysis. Smd proteins scored positive, with Smd2 displaying the 
strongest interaction, and the other Sm proteins and all Lsm proteins 
showing weak or no interaction (Supplementary Fig. 4a). We next 
examined whether preventing Sm binding to TERI affects cap 
hypermethylation. Whereas wild-type TERI was readily precipitated 
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Figure 2 | Sm proteins associate with TER1 precursor and promote 
spliceosomal cleavage. a, RNA from anti-c-Myc immunoprecipitates was 
analysed by RT-PCR using primers in the first and second exon (primers 
represented by arrows in the schematic below the gel) to amplify the precursor 
form (upper panel). The primer pair also amplify the spliced form (lower band in 
Sm immunoprecipitates). A primer pair in the first exon was used to visualize all 
forms of TER1 combined (lower panel). b, Sm association does not require 
spliceosome assembly on TER1. RT-PCR was performed on RNA purified from 
input (in) and anti-c-Myc immunoprecipitate beads (IP). Primers amplifying 
snRNA U1 were used as a positive control. c, The Sm-binding site (upper case) 
and 5’-splice-site (5’ss, lower case) for wild-type TERI and the terl1-sm6 mutant 
(MT). Replacing the Sm-binding site on TER1 (terl-sm6 mutant) compromises 
Sm association. RNA recovered from anti-c-Myc immunoprecipitates from 
untagged control and Smb1-Myc strains was quantified by real-time PCR. Data 
are plotted as enrichment over the untagged control. Error bars, standard error 
of triplicate experiments. d, Sm site mutation affects TER1 spliceosomal 
cleavage. Total RNA samples were analysed by northern blot for TER1 and 
snRNA U1. e, Increasing the distance between the Sm site and 5’-splice-site in 
the terl-spacer mutant (AU,GgccauaugGU) impairs TER1 processing. 
Northern blot for TER] and snoRNA snR101 as loading control. 


with a monoclonal antibody against the TMG cap, terl-sm6 recovery 
was at least 25-fold reduced (Fig. 3a and Supplementary Fig. 4b). Only 
the cleaved form of TER1 was recovered in TMG immunoprecipita- 
tions from wild-type cells, suggesting that spliceosomal cleavage 
precedes hypermethylation (Supplementary Fig. 4c). TER1 was not 
TMG-capped in a tgs1A strain, confirming that Tgsl is the enzyme 
responsible for TERI cap hypermethylation (Supplementary Fig. 4d). 

In light of the reported increase in telomerase RNA in tgs1A budding 
yeast**, we were surprised to observe a fivefold reduction in mature 
TERI RNA in fgs14 compared with wild type in S. pombe (Fig. 3b). In 
addition, an increase in the precursor indicated a 3’-end processing 
defect. The viability of tgs1A cells ruled out a major splicing defect, but 
we consistently noted a small reduction in spliceosomal snRNAs iso- 
lated from tgs1A cells (Fig. 3b and data not shown). To differentiate 
between a processing defect and a direct effect of the TMG cap on 
TERI stability, we mutated the spliceosomal cleavage site and inserted 
a hammerhead ribozyme sequence to generate the mutant terl- 
5'ssmut-HH (Supplementary Fig. 4e). In this construct, processing of 
TERI occurs independently of the spliceosome by ribozyme cleavage. 
When comparing terl—5’ssmut-HH levels between wild-type and 
tgs1A cells, a twofold reduction was observed (Fig. 3b). Taken together, 
these results show that tgs1 A affects TER] processing by the spliceosome 
as well as TER] stability. Consistent with the exquisite dosage sensitivity 
for telomerase RNA in diverse species**”’, this reduction in TER] 
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Figure 3 | Tgs1 modifies TER1 and is required for normal telomere 
maintenance. a, Loss of Sm site compromises TMG cap formation. RT-PCR 
amplifying all forms of TERI and terl-sm6 mutant from anti- TMG 
immunoprecipitation (IP) and input (in) samples; snRNA U1 served as control. 
b, Bypass of spliceosomal cleavage reveals functions of Tgsl in TERI processing 
and stability. Northern blot analysis of TER1, snRNA U1, snR101 and 5.8s 
rRNA from total RNA prepared from wild-type and tgs1A strains harbouring 
either TERI or the ter1—5’ssmut-HH mutant. An asterisk marks the position of 
the TERI precursor. c, Deletion of tgs!" causes telomere shortening. Telomere 
length was analysed by Southern blotting of EcoRI-digested genomic DNA 
from four independent fgs1A isolates and an otherwise isogenic tgs1* strain. A 
probe for the rad16 gene was used as a loading control (LC). 


resulted in shorter telomeres (Fig. 3c). Neither telomerase activity nor 
Lsm association was reduced beyond the effects expected from the 
reduced steady-state level of TER1 (Supplementary Fig. 4f, g). 

Most TERI post-spliceosomal cleavage was bound by Lsm2-8, but a 
small fraction was associated with Sm proteins (Fig. 1c). To investigate 
whether this was indicative of a switch from Sm to Lsm binding, we 
examined the distribution of 3’-ends in each immunoprecipitation by 
massively parallel sequencing. Around 70% of Sm-bound TER1 post- 
cleavage terminated precisely at the spliceosomal cleavage site (Fig. 4a 
and Supplementary Fig. 5a). Enrichment of this form in the Sm-bound 
fraction is consistent with Sm proteins binding the TERI] precursor 
and remaining associated with TERI until after cleavage and cap 
hypermethylation have occurred. In contrast, Lsm-associated TER1 
predominantly terminated in U3_¢, indicating that a switch between 
Sm and Lsm binding occurs after spliceosomal cleavage and is asso- 
ciated with exonucleolytic processing (Fig. 4a and Supplementary Fig. 5b). 


Consistent with most telomerase activity being associated with Lsm2-8, 
the TERI 3’-end distribution from Trtl immunoprecipitates was 
indistinguishable from that of Lsm-bound TERI. 

The observation that loss of Sm binding coincided with the loss of 
terminal nucleotides led us to speculate that Lsm2-8 may function in 
protecting the 3’-end of TER1 against further exonucleolytic degrada- 
tion. To test this, we attempted to generate Lsm deletion strains. 
Whereas most Lsm proteins are essential, /sm1A and Ism3/ cells were 
viable. Consistent with a protective function for Lsm2-8, the levels of 
TERI and U6 snRNA were reduced approximately fivefold in Ism3A 
cells (Fig. 4b). No such effect was seen when deleting /sm1, nor was the 
level of U1 snRNA reduced in /sm34 cells. The 3’-end sequence dis- 
tribution for TERI from total RNA of /sm3< cells closely resembled the 
Sm-bound fraction in wild type, whereas the Lsm-bound fraction was 
selectively lost in the mutant (Fig. 4c and Supplementary Fig. 5c). 
The viability of Ism3A cells further allowed us to confirm that cap 
hypermethylation is unaffected by the absence of Lsm consistent with 
Tgs1 acting on TER1 before Lsm binding (Supplementary Fig. 5d). 

To verify independently a role for Lsm proteins in stabilizing TER1, 
we took advantage of the observation that Lsm binding requires a 
stretch of consecutive uridines”. In contrast, Sm binding tolerates other 
nucleotides in certain positions of the binding motif, as exemplified 
by the Sm-binding site in human U1 snRNA (AAUUUGUG). When 
the TER1 Sm site was mutated to reduce the number of consecutive 
uridines, the level of mature TER1 was decreased (Fig. 5a). We next 
precipitated Smb1, Lsm4 and Trt1 from wild type and strains contain- 
ing the terl1-SmU1 mutant. As expected, the mutation had little effect 
on the binding of Sm proteins (Fig. 5b). In fact, when normalized for 
the lower level of terl-SmU1 compared with wild type, recovery of 
terl—SmU1 with Smb1 was increased 1.6-fold. In contrast, Lsm bind- 
ing was diminished by more than 20-fold. Most surprisingly, the inter- 
action between the catalytic subunit Trt1 and telomerase RNA was also 
compromised in the terl-SmU1 mutant (Fig. 5b). The normalized 
recovery of terl-SmU1 with Trtl was 15-fold lower than wild type, 
indicating that Lsm binding facilitates Trtl-TER1 association, 
possibly by inducing a conformational change in the RNA analogous 
to how binding of the p65 protein facilitates telomerase assembly in 
Tetrahymena****. Consistent with the poor recovery of ter1-SmU1 in 
Trtl immunoprecipitations, in vitro telomerase activity was below the 
level of detection (Fig. 5c). 

Analysis of the 3’-end sequence distribution for ter1-SmU1 from 
total RNA revealed that most of the mutant RNA ends at the cleavage 
site (Supplementary Fig. 6). This form constituted close to 90% of terl- 
SmU1 in Smb1 immunoprecipitates. In contrast, Lsm4 and Trtl 
immunoprecipitates predominantly recovered RNA ending in -AUUU 
and -AUUUG (Supplementary Fig. 6). These results further support 
that Trt1 preferentially associates with Lsm-bound telomerase RNA. 
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Figure 4 | Lsm proteins replace Sm and protect the 3’-end of TERI. a, 3’- 
End sequence distribution of TER1 from immunoprecipitation samples. 


b, Northern blot analysis from total RNA prepared from wild-type, /sm14 and 
Ism3A strains, quantified relative to wild type for each RNA. ¢, Specific loss of 
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Lsm2-8-bound fraction of TER1 in /sm34 cells based on 3'-end sequence 
analysis from total RNA samples. The wild-type sample from Fig. 1b is included 
for comparison. 
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Figure 5 | Lsm binding to TERI promotes telomerase assembly and protects 
TER1 from degradation. a, Northern blot for TERI. The indicated ratios of 
mutant (MT) to wild type (WT) are normalized to the loading control snR101 
(LC). b, Northern blot for TER1 and the terl1-SmU1 mutant using RNA 
isolated from anti-c-Myc immunoprecipitations performed on extract from 


They also confirm the role of Lsm in protecting the 3’-end of TER1 
from further degradation, as diminished Lsm binding coincides with 
an overall reduction in telomerase RNA and a shift towards the form 
that is bound by Sm. 

Taken together, our observations demonstrate that distinct popula- 
tions of TERI molecules associate with the Sm and Lsm complexes and 
suggest a sequence of events for TER1 biogenesis (Fig. 5d). The 
polyadenylated TERI precursor is bound by the Sm complex, which 
promotes spliceosomal cleavage and subsequent 5’-cap hypermethy- 
lation by recruiting Tgs1. The Sm ring is then replaced by the Lsm2-8 
complex, which protects TER1 from exonucleolytic degradation and 
promotes binding of the catalytic subunit. 

Despite their structural similarity and related binding motifs, Sm 
and Lsm complexes have different modes of RNA binding and were 
thought to have distinct and non-overlapping sets of target RNAs. The 
finding that the TERI precursor is exclusively associated with the Sm 
complex, whereas most mature TER] is bound by Lsm2-8, revealed 
that biogenesis of telomerase RNA involves both Sm and Lsm com- 
plexes. Considering the central roles that Sm and Lsm proteins play in 
RNA metabolism, it will be important to determine whether biogenesis 
of other non-coding RNAs also involves Sm- and Lsm2-8-bound 
stages. Furthermore, it is interesting to note that several human Sm/ 
Lsm proteins have been reported to co-purify with telomerase”, 
raising the possibility that these proteins also function in TMG cap 
formation and telomerase assembly in metazoans. 


METHODS SUMMARY 


Myc epitope tags were integrated at the genomic loci and immunoprecipitations 
were performed in whole-cell extracts with anti-c-Myc antibodies. The different 
forms of telomerase RNA were detected by northern blotting and RT-PCR. The 


1“ —> 


strains harbouring Smb1—Myc, Lsm4—Myc or Trtl—Myc as indicated. 

c, Telomerase activity assay performed on Trtl immunoprecipitates from 
strains harbouring either wild type or terl1-SmU1. An untagged Trt] strain was 
used as negative control. d, Sequence of events that occur during telomerase 
biogenesis. 


distribution of 3’-ends was assessed at single nucleotide resolution by preparing 
libraries of oligo(A)-tailed telomerase RNA and massively parallel sequencing. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Yeast strains and constructs. The genotypes of all strains used in this study are 
listed in Supplementary Table 1. Strains expressing c-Myc-tagged Sm and Lsm 
proteins were constructed in strain PP138 as described*'. Mutants ter1-sm6, terl- 
smAG, terl-smAUG, terl-smAU2G, terl-smAU3G and ter1-5'ssmut-HH were 
integrated at the ter! genomic locus by gene replacement. Other ter] mutants 
were generated in the context of plasmid pJW10 using the QuikChange II XL 
site-directed mutagenesis kit (Stratagene) and introduced into PP407, PP694 or 
PP695 as described’. 

Yeast two-hybrid analysis. Yeast two-hybrid analysis used the Matchmaker 
GAL4 Two Hybrid System 3 (Clontech). Briefly, fgs1* cDNA was cloned into 
the vector pGBKT7, and each full-length /sm and sm cDNA was cloned into 
pGADT7. Plasmids were co-transformed into the yeast strain AH109 and positive 
transformants were selected on SD-Leu-Trp plates. Interactions were analysed by 
plating threefold serial dilutions of overnight cultures onto SD-Leu-Trp-His- 
Ade plates. Plates were incubated for three days at 30 °C. 

Telomere length analysis and telomerase activity assay. Cells were propagated 
for at least 80-100 generations and telomere length was analysed by Southern 
blotting as described**. Telomerase activity assays were performed on Sepharose 
beads as described'** after immunoprecipitation from cell extracts of strains 
harbouring Myc-tagged Trtl, Sm or Lsm proteins. 

Immunoprecipitation and RNA isolation. S. pombe cells were grown in yeast 
extract supplements* and 6] of cell suspension were collected by centrifugation at 
a density of 5 X 10° cells per millilitre. Cells were washed in TMG(300) (10 mM 
Tris-HCl, pH 8.0, 1 mM magnesium chloride, 10% (v/v) glycerol, 300 mM sodium 
acetate), the pellet was resuspended in two packed cell volumes of TMG(300) plus 
supplements (51g ml’ chymostatin, 54g ml~' leupeptin, 1 1g ml’ pepstatin, 
1 mM benzamidine, 1 mM DTT, 1mM EDTA and 0.5mM PMSB) and the sus- 
pension was frozen in liquid nitrogen. Cells were lysed under liquid nitrogen in a 
6850 cryogenic mill (SPEX CertiPrep) with eight 2 min cycles at an impactor rate 
of 10 per second and a 2 min cooling time between cycles. The lysed cell powder 
was transferred into a 50 ml tube and allowed to thaw on ice for 30 min. Cell 
extracts were cleared by two rounds of centrifugation at 14,000g for 7 min and 
frozen in liquid nitrogen for storage at — 80°C. The concentration of proteins in 
the whole-cell extract was measured by Bradford protein assay. For c-Myc 
immunoprecipitation, monoclonal anti-c-Myc antibody (20 1g, Sigma) was incu- 
bated with 150 ul protein A/G agarose slurry (Calbiochem) in phosphate buffered 
saline at room temperature for 30min. Beads were washed three times with 
TMG(300) plus supplements and whole-cell extract (1.2 ml) was added at a con- 
centration of 5 mg ml 2 together with RNasin (40 U, Promega), Tween 20 (0.1%) 
and heparin (1 mg ml” '). For immunoprecipitation of TMG-capped RNAs, anti- 
TMG antibody (3 pg, Calbiochem) was bound to 50 ul protein A/G agarose slurry 
(Calbiochem), washed with TMG(300) and 150 pg total S. pombe RNA was added 
in 0.7 ml TMG(300). Samples were incubated on a rotator at 4°C for 4h, then 
washed three times with TMG(300) plus supplements and 0.1% Tween 20 and 
once with TMG(50) (as TMG(300) but only 50mM sodium acetate). Protease 
inhibitors were omitted for TMG immunoprecipitations. RNA was isolated by 
treatment with proteinase K (2.0 mgm! in 0.5% (w/v) SDS, 40mM EDTA, 
20mM Tris-HCl, pH 7.5) at 50°C for 15 min, followed by extraction with acidic 
phenol and ethanol precipitation. RNA was then analysed by northern blotting, 
RT-PCR and 3’-end sequencing. 

RNA analysis. RNA isolation and northern blotting were performed as described” 
except that Biodyne Nylon Transfer Membrane (Pall Corporation) was used and 
samples shown in Fig. 5a were treated with RNaseH in the presence of oligonucleotides 
BLolil043 (AGGCAGAAGACTCACGTACACTGCAC), BLolil275 and PBoli560 
(GCGGAATTCT}s) to obtain better separation of precursor and mature form. The 
TERI probe was generated as described’; other RNAs were detected using 
5'-[?*P] DNA oligonucleotides as follows: GCTGCAGAAACTCATGCCAGGTA 
AGT (snRNA U1), CGCTATTGTATGGGGCCTTTAGATTCTTA (snoRNA 
snR101), CTTCATCGATGCGAGAGCCAAGAGATCCGT (5.88 rRNA) and 
GCAGTGTCATCCTTGTGCAGGGGCCA (snRNA U6). 
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Semi-quantitative RT-PCR was performed as described previously’ with 
primers BLolil275 (CGGAAACGGAATTCAGCATGT) and BLolil020 (CAAA 
CAATAATGAACGTCCTG) amplifying the intron-spanning region, and 
PBoli918 (ACAACGGACGAGCTACACTC) and BLoli1006 (CATTTAAGTGC 
TTGTCAGATCACAACG) amplifying a region in the first exon. BLoli2051 
(GACCTTAGCCAGTCCACAGTTA) and BLoli2101 (ACCTGGCATGAGTTTC 
TGC) were used to amplify snRNA U1. 

For quantitative real-time RT-PCR, reverse transcription for input and 
immunoprecipitated RNA were performed with antisense primer BLoli2860 
(TGCTCAGACCAAGTGAAAAA) and BLoli2051. Real-time PCR was per- 
formed in triplicate 12.5 il reactions using Power SYBR Green PCR Master Mix 
(Applied Biosystems) according to the manufacturer’s instructions. BLoli2860 and 
BLoli2859 (GGATCAAAGCTTTTGCTTGT) were used to amplify the first exon 
of TER1. BLoli2051 and BLoli2101 were used to amplify snRNA U1. The qRT- 
PCR results were imported into Microsoft Excel and the average value and standard 
deviation of triplicate cycle threshold (C,) values were calculated. Enrichment of 
immunoprecipitation is represented by AC, (C, value (immunoprecipitation 
sample) minus C, value (input)) relative to the untagged control samples. Error 
bars in the graph represent the positive and negative range of the standard error of 
the mean. 
3’-end cloning. DNase-treated total RNA samples (2.5 ug) or immunoprecipitated 
and purified RNA was incubated with poly(A) polymerase (600 U, US Biologicals), 
RNase inhibitor (RNasin, 40 U) and ATP (0.5mM) in 20 pl reactions at 30°C 
for 30 min. The reaction volume was increased to 35.5 pl by the addition of the 
oligonucleotide Bloli2327 (CAAGCAGAAGACGGCATACGA(T);s, 125 pmol) 
and dNTP mix (25 nmol), and reactions were incubated at 65 °C for 3 min followed 
by slow cooling to room temperature. The reaction volume was then adjusted to 
50 pl with first strand buffer (Invitrogen), dithiothreitol (5 mM), RNasin (40 U) 
and Superscript III reverse transcriptase (200 U, Invitrogen), and reactions were 
incubated at 50 °C for 60 min. RNaseH (5 U, NEB) was added and incubation was 
continued at 37 °C for 20 min. Aliquots (3 ul) of this reaction were used in PCR 
with Taq polymerase (5 U, NEB), primers (GTTCAGAGTTCTACAGTCCGAC 
GATC##GCAAAATGTTAAAAGGAACG) and Bloli2330 (CAAGCAGAAGAC 
GGCATACGA) (200 nM each, ## represents a two-nucleotide barcode used for 
multiplexing) under the following conditions: 3 min at 94 °C followed by 10 cycles 
of 30s at 94°C, 45s at 55°C and 60s at 72°C, followed by 7 min at 72°C. PCR 
products were purified using the QIAquick PCR Purification Kit (Qiagen) and 
eluted with 46 pil elution buffer. In the second round of PCR, 23 tl of the eluted 
product was amplified with BLoli2329 (AATGATACGGCGACCACCGACAGG 
TTCAGAGTTCTACAGTCCGA) and BLoli2330 (200 nM each) under the fol- 
lowing conditions: 3 min at 94 °C followed by 29 cycles of 30 s at 94°C, 45 s at 55 °C 
and 60s at 72°C, followed by 7 min at 72°C. PCR products were separated by 
electrophoresis on 1.5% agarose gels, and bands of the correct size were excised and 
purified. The concentration of the PCR products was measured using an Agilent 
2100 Bioanalyzer (Agilent Technologies) and further adjusted to 10nM for 
massively parallel sequencing using Illumina sequencing technology. Reads were 
analysed using a custom script written in BioPerl to filter for those that contained 
the TER] sequence (GCAAAANjAACG) and to sort the reads into different bins 
based on the two-nucleotide barcodes. The nucleotide sequence between 
GCAAAAN,,AACG and the oligo(A) sequence resulting from the poly(A) 
polymerase treatment represents the end of TERI and was used to determine 
the 3’-end sequence distribution at single nucleotide resolution. Further analysis 
and graphs were prepared in Microsoft Excel. 
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Visualizing molecular juggling within a 
Bj2-dependent methyltransferase complex 
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Derivatives of vitamin B,, are used in methyl group transfer in 
biological processes as diverse as methionine synthesis in humans 
and CO, fixation in acetogenic bacteria’ *. This seemingly straight- 
forward reaction requires large, multimodular enzyme complexes 
that adopt multiple conformations to alternately activate, protect 
and perform catalysis on the reactive B,, cofactor. Crystal struc- 
tures determined thus far have provided structural information for 
only fragments of these complexes* ”, inspiring speculation about 
the overall protein assembly and conformational movements 
inherent to activity. Here we present X-ray crystal structures of a 
complete 220 kDa complex that contains all enzymes responsible 
for B,.-dependent methyl transfer, namely the corrinoid iron- 
sulphur protein and its methyltransferase from the model acetogen 
Moorella thermoacetica. These structures provide the first three- 
dimensional depiction of all protein modules required for the 
activation, protection and catalytic steps of B,.-dependent methyl 
transfer. In addition, the structures capture B,, at multiple locations 
between its ‘resting’ and catalytic positions, allowing visualization of 
the dramatic protein rearrangements that enable methyl transfer 
and identification of the trajectory for B,, movement within the 
large enzyme scaffold. The structures are also presented alongside 
in crystallo spectroscopic data, which confirm enzymatic activity 
within crystals and demonstrate the largest known conformational 
movements of proteins in a crystalline state. Taken together, this 
work provides a model for the molecular juggling that accompanies 
turnover and helps explain why such an elaborate protein framework 
is required for such a simple, yet biologically essential reaction. 

B,2-dependent methyl transfer lies at the heart of methylation 
biochemistry and is an essential reaction in human health and micro- 
bial CO, sequestration”’. In humans, methionine synthase (MetH) 
methylates homocysteine to form methionine to maintain cellular 
pools of folate (vitamin By) and S-adenosylmethionine (AdoMet), 
the universal methyl donor. MetH mutation or vitamin By, deficiency 
can cause serious health consequences, including megaloblastic anaemia 
and birth abnormalities such as neural tube defects'*. Acetogenic bacteria, 
including M. thermoacetica, use the corrinoid iron-sulphur protein 
(CFeSP) and its methyltransferase (MeTr) together to catalyse methyl 
transfer in the Wood-Ljungdahl carbon fixation pathway for growth on 
CO, as the sole carbon source™. 

For both MetH and CFeSP/MeTr, methyltetrahydrofolate (CH3- 
Hyfolate) is the methyl donor, and a protein-bound Bj, derivative 
(cobalamin for MetH and 5’-methoxybenzimidazolyl cobamide for 
CFeSP) is the methyl carrier. In acetogenic bacteria, the CH3- 
H,folate methyl group is derived from enzymatic reduction of CO, 
whereas in humans, CH3-H,folate is the predominant circulating form 
of the vitamin. Although CH3-H,folate is the common methyl source, 
methyl removal from the N° tertiary amine is chemically challenging 
because the product, tetrahydrofolate (Hyfolate), is a poor leaving 


group’. Therefore, a particularly powerful nucleophile is required, and 
Byz with cobalt in the +1 oxidation state, a Co(I) species dubbed a 
‘supernucleophile’’, is recruited. Such strong reactivity comes at a price: 
reducing the inactive Co(II) state to active Co(I) is thermodynamically 
challenging, as the Co(II/I) reduction potential is one of the lowest in 
nature, —504mV in CFeSP and —526 mV in MetH'””. In CEFeSP, an 
electron is first delivered from a partner protein to an Fe,S, cluster 
harboured by an activation domain’*””. The electron is then passed to 
Co(II) to yield Co(I) (equation (1)), which attacks CH3-H,folate to form 
CH3-Co(III) (equation (2)). CFeSP then delivers the methyl group to the 
Ni,Fe,S4 active site metallocluster (A-cluster) of acetyl-CoA synthase 
(ACS), where it becomes the methyl of acetyl-CoA, and B,, returns to its 
nucleophilic Co(I) state. 


[4Fe-4S]'* + Co(II) © [4Fe-4S]?* + Co(1) (1) 
Co(I) + CH3-Hyfolate <+ CH3-Co(II) + Hyfolate (2) 


During the catalytic cycle of both MetH and CFeSP/MeTr 
(Supplementary Fig. 1), a series of ‘molecular juggling’ acts must be 
performed in which domains rearrange to contact the B,, cofactor. 
Crystal structures of a MetH B,2-binding fragment* and CFeSP from 
Carboxydothermus hydrogenoformans (ChCFeSP)’ both depict a ‘rest- 
ing’ state, where B,, is buried by a protective ‘capping’ domain, 
shielded from unwanted chemistry but inaccessible to substrate. 
Because methyl transfer uses Sy2 substitution”, large conformational 
changes must ‘uncap’ B,, before chemistry can occur. By is ‘uncapped’ 
in structures of MetH fragments that depict B, activation”’”"’, but no 
structure has been solved that shows B,2- and CH3-Hy4folate-binding 
domains together to illustrate methyl transfer. 

To visualize this elusive methyl transfer complex, we determined 
a 2.38A resolution structure of folate-free CFeSP/MeTr from 
M. thermoacetica (Fig. 1 and Supplementary Table 1). The homodimeric 
MeTr component (58 kDa) is virtually identical to previous structures 
of both MeTr*’? (Supplementary Fig. 2a), root mean squared 
deviation (r.m.s.d.) for Cox atoms 0.39 A, and the analogous MetH 
domain that binds CH3-Hyfolate’, r.m.s.d. 1.03-1.08 A. MeTr and 
MetH both use (f/a)g triosephosphate isomerase (TIM) barrels to 
bind and activate CH3-H,folate for nucleophilic attack. Two CFeSPs 
are present in the complex, each containing two subunits. The small 
subunit (35 kDa) is a TIM barrel which acts as the B,. ‘cap’ in the 
ChCFeSP structure’, while the large subunit (48kDa) has three 
domains joined by linkers: an amino (N)-terminal Fe,S, activation 
domain (residues 1-57), a TIM barrel domain (residues 93-312) anda 
carboxy (C)-terminal B,-binding domain (residues 325-446). With 
the exception of the Fe,S, and B;, domains, discussed below, both 
CFeSP copies align well to the ChCFeSP structure, r.m.s.d. 0.81-0.85 A 
(Supplementary Fig. 2b). 
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Figure 1 | The overall CFeSP/MeTr complex. Ribbon representation of MeTr 
homodimer (MeTr) in light and dark pink, CFeSP small subunits in orange, 
CFeSP large subunit Fe,S, domains in teal and cyan, TIM barrel domains in 


In the 220 kDa CFeSP/MeTr assembly (Fig. 1), the MeTr homodi- 
mer lies in the centre, with one CFeSP bound on either side. Each MeTr 
monomer has a C-terminal o-helix (residues 255-262) protruding 
from the TIM barrel rim. Contacts between this helix and its preceding 
loop with a CFeSP small subunit helix (residues 191-204) form the 
primary interactions between MeTr and CFeSP (Supplementary Fig. 3). 
Weak interactions between MeTr and CFeSP Fe,S, domains have 
stabilized these highly flexible’ domains responsible for Bj, activation, 
allowing their visualization as bundles of short «-helices connected by 
long loops that coordinate the Fe4S, cubane (Fig. 1 and Supplementary 
Figs 4 and 5). The Fe,S4 domains are observed to adopt a variety of 
positions that are all too far from the B,, to afford reductive activation’. 
However, the long and primarily unstructured protein linkers that 
connect both the Fe,S, and B,. domains to the central TIM barrel 
must allow for the requisite flexibility for B,, activation (Supplemen- 
tary Figs 2b and 6). 

Bj domains of both MetH and CFeSP adopt Rossmann-like 
architectures that bind B,, in the base-off conformation (Fig. 1)*”. 
High B-factors support the notion of flexibility mentioned above (Sup- 
plementary Fig. 7 and Supplementary Table 2), where electron density 
for both B,, domains represents a highest occupancy position within 
an ensemble, rather than a sole conformation. In both CFeSPs, the 
average B,> domain position resides between the ‘capping’ small sub- 
unit TIM barrel and the TIM barrel of a MeTr monomer, which are 
adjacent and nearly perpendicular to each other (Fig. 1). On average, 
the Bj. Co has shifted approximately 6.5A away from its ‘resting’ 
location towards the MeTr folate-binding site. B,, in this structure is 
thus positioned ‘en route’ towards catalysis, with approximately 18 A 
remaining to the methyl group of folate modelled into the MeTr active 
site, based on an alignment with the folate-bound MeTr structure’®. In 
transitioning between ‘resting’ and ‘en route’ positions, the B,2 corrin 
ring breaks three interactions with the ‘capping’ domain and forms 
new contacts, including an H-bond with Asn 203 of MeTr (Fig. 2a, b). 

Given the flexibility suggested by this structural analysis, we 
explored whether the B,, domain can sample the 18 A necessary to 
afford turnover within intact crystals, using anaerobic in crystallo 
ultraviolet-visible absorption spectroscopy to monitor the state of 
B,>. In crystallo and analogous solution spectra were collected in 
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green and blue, and By, domains in dark green and dark blue. Bj. cofactors in 
magenta sticks with cobalt as violet spheres. Fe,S, clusters in spheres: Fe in 
orange, S in yellow. 


parallel (Fig. 3) for the as-isolated Co(II) form of B,>. Reduction to 
Co(I) and methylation to CH3-Co(III) were then achieved in crystallo 
and in solution, with all spectra matching well-established CFeSP 
absorption features’*’°’***, Importantly, these features disappear when 
light is passed through the solution surrounding the crystals, indicating 
that spectra represent protein in crystals and not protein that may have 
been liberated into the solution. Collectively, these data demonstrate 
enzymatic transfer of the CH3-H,folate methyl group to CFeSP-bound 
By», evidence that the B;, domain is able to move at least 18 Ato trigger 
methyl transfer within the crystal. To our knowledge, this conforma- 
tional movement represents the largest observed in a crystallized 
protein (Supplementary Discussion). Such dramatic B,, domain move- 
ment is probably facilitated by the fact that CFeSP/MeTr is mostly 
composed of rigid TIM barrels that provide all the lattice contacts 
(Supplementary Fig. 8). Although their biosynthesis is energetically 
expensive, these high molecular mass TIM barrel scaffolds may be 
important for B).-dependent methyltransferases to maintain structural 
integrity during the conformational gymnastics that alternately enable 
activation, protection and catalysis of the highly reactive B,, cofactor. 
Thus, despite the small size of the transferred methyl moiety, these large 
conformational changes appear to necessitate large enzyme sizes. 
Although the folate-free CFeSP/MeTr structure describes large B,2 
domain movements that ‘uncap’ B,. from the small subunit, it is 
interesting to consider why binding of CFeSP to MeTr does not simply 
position the B,;, domain directly over the MeTr active site. One 
explanation is that the structure represents an inactive complex; 
however, in crystallo results clearly demonstrate that CFeSP/MeTr 
crystals are active. Another explanation posits that an ensemble of 
“en route’ conformations exists when CH3-H,folate is absent, and that 
CH;-Hy,folate binding would shift the conformational equilibrium, 
moving Bj, closer to the folate-binding site. To obtain experimental 
support for this hypothesis, we solved additional CFeSP/MeTr struc- 
tures co-crystallized with CH;-H,folate, with and without Ti(IID) 
citrate as a reductant at 3.03 A and 3.50A resolution, respectively. 
Absorption spectroscopy performed on these crystals shows that these 
structures represent a substrate form (CH3-H,folate bound, Bj, in the 
Co(II) state) and a product form (Hyfolate bound, B,. in the CH3- 
Co(II) state) of the complex (Supplementary Fig. 9). Compared with 
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Figure 2 | Comparison of By. positions in ‘resting’ ChCFeSP, folate-free 
and folate-bound CFeSP/MeTr. a, ChCFeSP (grey ribbons, Co of Bj»: black 
sphere). b, Folate-free CFeSP/MeTr (green ribbons, Co of B,,: green sphere) 
superimposed with CH3-H,folate-bound MeTr (Protein Data Bank accession 
number 2E7F, pink ribbons). c, Folate-bound CFeSP/MeTr (orange ribbons, Co 
of Bj»: orange sphere). Parts a—c are identical in orientation; B,, sticks coloured 
as C, ribbon colour; O, red; N, blue; P, orange. d, Superposition of ChCFeSP 
(grey), folate-free CFeSP/MeTr (green), and folate-bound CFeSP/MeTr 


the folate-free structure, B,, in both folate-bound structures has 
indeed moved even closer to the MeTr folate-binding site (by an 
average of 7.7 A) and exhibits new H-bonding features (Fig. 2c). In 
these folate-bound structures, the B,, corrin ring has severed all inter- 
actions with the ‘capping’ CFeSP small subunit and contacts only 
MeTr residues. Here, asparagine and glutamine residues that line the 
MeTr surface appear to participate in an ‘amide hand off, sequentially 
passing B, along its trajectory as it progresses towards folate (Fig. 2b-c 
and Supplementary Fig. 10). 

Interestingly, the terminal amide in this ‘hand off, Asn 199, is 
strictly conserved in both MeTr and MetH and was previously shown 
to switch conformations between folate-free and folate-bound forms”®, 
a feature also observed in the CFeSP/MeTr structures presented here 
(Fig. 2b, c, e). Inapo-MeTr structures, Asn 199 points upwards and out 
of the active site, whereas in folate-bound MeTr structures Asn 199 
turns down to H-bond with the N’ of folate. Because N199A mutation 
moderately hinders folate binding (20-fold in dissociation constant, 
Kg) but dramatically compromises catalytic efficiency (keat/Km) by 
25,000-fold, Asn 199 is thought to be important for formation of the 
transition state’®. In our CFeSP/MeTr structures, we observe a new role 
for Asn 199 in By, domain conformational switching: when folate is 
absent, Asn 199 points out of the active site, blocking a closer By. 
position. However, when folate binds and Asn 199 reorients to 
H-bond with folate, space becomes available for B,;, to move closer 
to the MeTr folate-binding site. Therefore, the position of Asn 199 
itself could help shift the conformational equilibrium of the By 
domain, signalling that substrate has bound to MeTr. Asn 199 is an 
ideal signal for substrate binding, as it is the only MeTr residue known 
to reposition upon folate binding”. 


LETTER 


~27 A 


N199 


(orange) structures in a-c, highlighting one helix (thick ribbons) to show 
clamping motion (helix axes as straight blue lines) and By, (sticks) with 12- 
residue linker (thick ribbons) to By), domain (surface) to show swinging motion. 
e, Superposition of B,, and CH3-Hyfolate in d, with Asn 199 shown for CFeSP/ 
MeTr structures in sticks (C, ribbon colour; O, red; N, blue). f, Same as e, with 
2F, — F. density in blue (1.0c) and pink mesh (4.0¢), and F, — F, density in 
green mesh (3.0c) for folate-bound CFeSP/MeTr structure. Putative alternative 
Bj» corrin: cyan. g, Superposition of B,2 cofactors and CH3-Hafolate in f. 
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Figure 3 | Methyl transfer activity of CFeSP/MeTr crystals by ultraviolet- 
visible absorption spectroscopy. As-isolated spectra (blue lines) for CFeSP/ 
MeTr crystals and CFeSP in solution similarly have broad features at 
approximately 400 and 470 nm arising from the Fe,S, cluster and the B,, corrin. 
Following established protocols'*"°”*??””7, B,, reduction was achieved with 
Ti(II]) citrate, yielding a sharp 390 nm peak indicative of active Co(I) in both 
solution and in crystallo spectra (black lines). Further treatment with CH3- 
Hyfolate yields decreased absorbance at 390 nm and a new peak at 450 nm (red 
lines), characteristic of the product complex (protein-bound CH3-Co(III)'*!?*”*). 
A control reaction (green line) confirms that turnover does not occur from free 
CH3;-H,folate without MeTr, and the 450 nm peak indicates that B,, remains 
CFeSP-bound (free B;. has a peak at approximately 520 nm instead’**!””). 
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“Resting” CFeSP: 
Co(|) protected 
(ChCFeSP structure) 


B,, “en route” 
(this work) 


B,, equilibrium shifted 

(this work) 
Figure 4 | Cartoon model of B,2-dependent methyl transfer in CFeSP/ 
MeTr. For simplicity, only one of the two CFeSP heterodimers is shown. 
Protein domains are coloured as in Fig. 1, loops represent linkers, red hexagon 


Displacement of the B,, domain from its resting position to the posi- 
tion nearest the folate-binding site can be attributed to two independent 
conformational changes within the complex, best described as ‘swinging’ 
and ‘clamping’ motions (Fig. 2d). The B,, domain can ‘swing’ relative to 
the rest of CFeSP (Supplementary Fig. 11), and CFeSP can ‘clamp’ the B,2 
domain towards the MeTr active site (Fig. 2d and Supplementary 
Fig. 12). Despite varying degrees of ‘clamping’ over a range of approxi- 
mately 14° across the structures, the interface between the CFeSP small 
subunit and MeTr is preserved (Supplementary Fig. 13). 

Although folate binding shifts the average position of the By, 
domain closer to the MeTr folate-binding site, the B,2 Co is still too 
far for Sy2 methyl transfer (Fig. 2e). Intriguingly, a large, continuous 
electron density peak is present in 2F, — F,, F, — F,, and composite 
omit maps, emanating from the corrin ring and stretching directly over 
the folate-binding site, suggestive of an alternative, low-occupancy 
corrin conformation (Fig. 2f and Supplementary Fig. 14). A trial 
refinement of a putative corrin ring at 40% occupancy satisfies the 
F, — F, difference maps (Supplementary Fig. 15), positioning B,2 over 
the folate-binding site. 

The multiple positions of B,, captured here (Fig. 2g) highlight the 
conformational flexibility of the CFeSP/MeTr scaffold and provide a 
framework to understand the molecular juggling of domains during 
B,.-dependent methyl transfer (Fig. 4). Before MeTr binding, the 
CFeSP B,, domain rests against the ‘capping’ small subunit, as in 
the structure of ChCFeSP”, with reactive Co(I) of B,2 protected (‘rest- 
ing’ state). From this conformation, either the ‘cap’ or the B,, domain 
must move to allow substrate access. Our folate-free CFeSP/MeTr 
structure indicates that upon MeTr binding, the B;, domain becomes 
‘loosened’ and flexible, adopting an ensemble of conformations that lie 
en route towards the MeTr active site. Here, the reactive B,2 species 
would be protected by the CFeSP small subunit and MeTr TIM barrels. 
CH3-H,folate binding to MeTr accompanied by movement of Asn 199 
shifts the equilibrium of B;, domain conformers, placing B,, closer to 
folate, as in our folate-bound CFeSP/MeTr structures, with B,, pro- 
tection afforded by MeTr. It is notable that even after CH3-H,folate 
binds, the major B,, position is still not directly over the folate methyl 
group, as such a position is expected to be transient. After methyl 
transfer, the B}, domain can return to the small subunit to ‘re-cap’ 
the methylated B,, product, protected by the small subunit TIM barrel. 

Overall, our data indicate that B,, domain movement is not a simple, 
two-state switch between ‘resting’ and ‘catalytic’ conformations. 
Instead, a flexible B;, domain samples an ensemble of conformations, 
where subtle shifts of the conformational equilibrium place B,, pro- 
gressively closer to the active site, thereby increasing the population of 
conformers capable of methyl transfer without obstructing substrate 
access or hindering domain movement. This model is consistent with 
MetH studies where ligation, alkylation and redox state of the By, 
cobalt can favour/disfavour various binding modes, alternately shifting 
the equilibrium of conformers for ordered domain rearrangements 
during the reaction cycle’*'***”*. We further identify MeTr residues 
that contact B;, along its trajectory, ending with Asn 199. In MetH the 
Bj, ligating residue His 759 has been shown to play a dual role in 
catalysis and in signalling conformational shifts’’. The strictly con- 
served, folate-binding Asn 199 of MeTr could similarly play a dual role 
in both catalysis and conformational signalling. We thus expect this 
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Folate-free complex: CH,-H,folate-bound complex: “Folate-on” conformation: 


Methyl transfer reaction 


Product bound: 
CH,-Coilll) protected 


H,folate-bound complex: 
B,» equilibrium shifted 


(transient) (this work) 


is Bj, blue rectangle is folate and transferred methyl group is shown as a yellow 
sphere. Curved arrows denote ‘swinging’ and ‘clamping’ motions. 


model for dynamic domain juggling, communicated by residues 
involved in substrate and cofactor binding, to be a common theme 
in methyl transfer between the B vitamins folate and Bj. 


METHODS SUMMARY 


CFeSP and MeTr were expressed and purified anaerobically from M. thermoacetica 
ATCC 39073 and from recombinant Escherichia coli, respectively. Crystals were 
grown anaerobically by hanging drop vapour diffusion. Diffraction data were 
collected at 24ID-C at the Advanced Photon Source, Argonne National Laboratory, 
and 5.0.2 and 8.2.2 at the Advanced Light Source, Lawrence Berkeley National 
Laboratory. Structures were solved by molecular replacement. Data collection and 
refinement statistics are presented in Supplementary Table 1, and representative 
electron densities for protein domains and for cofactors/substrate are shown in 
Supplementary Figs 16-21. Solution and in crystallo absorption spectra were collected 
as described in the text and in the Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Protein purification. CFeSP was expressed and purified anaerobically from M. 
thermoacetica ATCC 39073 as described”', except for the following modifications. 
CFeSP was purified from cell extracts using DEAE-cellulose and high-resolution 
Q-Sepharose anion exchange chromatography followed by phenyl-Sepharose 
hydrophobic interaction chromatography. Fractions containing CFeSP were con- 
centrated and buffer exchanged using Amicon ultracentrifuge concentrators in the 
anaerobic chamber. MeTr was expressed and purified anaerobically from recom- 
binant E. coli as described®. Concentrations of CFeSP and MeTr protein samples 
were determined using the Rose-Bengal method” and kept in storage buffer: 
50 mM Tris-HCl, pH 7.6, 100mM NaCl, 2 mM dithiothreitol. 

Crystallization. Crystals of the folate-free CFeSP/MeTr complex were grown by 
hanging-drop vapour diffusion in an anaerobic chamber (Coy Laboratories) at 
room temperature by adding 1 ul of precipitant (100mM Bis-Tris, pH 6.5, 
100mM calcium acetate, 9% PEG 5,000 monomethyl ether, 20% glycerol) to 
2 il ofan equimolar mixture of CFeSP and MeTr (approximately 250 1M monomer 
for each), over a 0.5 ml reservoir solution of precipitant. Large, brown, rod-shaped 
crystals appeared overnight. Crystals were looped and cryo-cooled in liquid 
nitrogen anaerobically before collection of X-ray diffraction data at 100K. 
Crystals of CFeSP/MeTr co-crystallized with the CH3-Hufolate substrate were 
obtained in the same manner as above, except the protein solution also contained 
CH3-H,folate at 1mM concentration. Crystals of CFeSP/MeTr co-crystallized 
with both CH3-Hyfolate and Ti(III) citrate as a reductant were obtained in the 
same manner, except the precipitant solution also contained Ti(II]) citrate at 3 mM 
concentration. 

Structure determination of folate-free CFeSP/MeTr structure. Two X-ray dif- 
fraction data sets were collected for the folate-free CFeSP/MeTr structure. A 
lower-resolution data set (3.3 A) was collected at the Advanced Light Source 
(ALS) beam line 5.0.2 (A= 1.1000 A), and a higher-resolution data set (2.38 A) 
was later collected at the Advanced Photon Source (APS) beam line 24ID-C 
(2 = 0.9792 A). 

The initial data set to 3.3A resolution was processed in HKL2000 and 
Scalepack”’. The structure was solved by molecular replacement in Phaser”®, using 
individual structures of MeTr’® (Protein Data Bank accession number 2E7F) and 
ChCFeSP? (Protein Data Bank accession number 2H9A) lacking its B;, domain as 
independent search models. Two CFeSP/MeTr complexes (approximately 
220kDa each) were found in the asymmetric unit, and crystals belonged to the 
space group P2,2,2, with unit cell dimensions (A): a = 137.42, b= 159.87 and 
c = 241.92. Iterative rounds of refinement with residue-grouped B-factors were 
performed in CNS*! and PHENIX”, with model building in Coot**. The four B,, 
domains present in the asymmetric unit were kept as a polyalanine model. Final 
R-factors for working and test reflections (Ryork and Rgree) Were 29.2% and 33.7%, 
respectively, when refinement of the structure to higher resolution began. Data 
collection and refinement statistics for this data set are shown in Supplementary 
Table 1. Ramachandran analysis was performed in PROCHECK™: 74.0% of resi- 
dues resided in the most favoured region, with 21.2% additionally allowed, 3.1% 
generously allowed and 1.7% disfavoured. 

The data set to 2.38 A resolution was processed in HKL2000 and Scalepack”. 
Although this crystal formed in similar conditions as the crystal which gave the 
3.3 A resolution data set, the space group was now P2,2,2, with unit cell dimen- 
sions (A): a = 125.71, b = 242.84 and c = 79.67. The structure for this crystal was 
thus solved by molecular replacement in Phaser”® using the MeTr homodimer and 
CFeSP heterodimers lacking B,, domains from the previously refined model of the 
3.3 A resolution structure as independent search models. Only one CFeSP/MeTr 
complex was present in the asymmetric unit. Iterative rounds of refinement were 
performed in CNS* and PHENIX”, with model building in Coot**. Translation/ 
libration/screw refinement was performed in latter refinement rounds with seven 
translation/libration/screw groups: the MeTr homodimer (chains A and B), the 
Fe,S, domain of one CFeSP large subunit (chain C), the TIM domain of chain C 
with one small subunit (chain D), the B}, domain of chain C, the Fe,Sy domain of 
the second large subunit (chain E), the TIM domain of chain E with the second 
small subunit (chain F), and the B,. domain of chain E. Data collection and 
refinement statistics are shown in Supplementary Table 1, and average B-factors 
for each domain of the final model are given in Supplementary Table 2. 
Ramachandran analysis was performed in PROCHECK™: 90.1% of residues reside 
in the most favoured region, with 9.5% additionally allowed, 0.3% generously 
allowed and 0.2% disfavoured. The final model contains residues 1-262 (of 262) 
for both MeTr chains (A and B), residues 2-442 (of 446) for both CFeSP large 
subunit chains (C and E) and residues 1-323 (of 323) for both CFeSP small subunit 
chains (D and F). 

Except for the Fe,S, and B;2 domains, the entire structure is composed of TIM 
barrels for which the electron density is well-defined (Supplementary Fig. 16). 
Electron density is weaker for the Fe,S, domains (Supplementary Figs 5, 17 and 


19), consistent with the fact that these domains exhibit higher B-factors 
(Supplementary Table 2 and Supplementary Fig. 7). However, reasonable electron 
density is present for the main chain and most side chains, allowing us to build a 
model for this domain. Still, several side chains of the Fe,S, domains lack clear 
electron density; thus, for these residues, atoms were truncated past the CB atom 
(chain C, 12 residues truncated and chain E, 16 residues truncated, out of 56 total 
residues). 

Although B-factors are high and electron density is weak for the B,2 domains in 
general, electron density for the B,, cofactors is unambiguous (Supplementary Fig. 
20), and density is also clear in several helical regions, including those near B,2 
(Supplementary Fig. 18). Because the structure of a CFeSP B,, domain bound with 
By» was already known’, we used the clear electron density of the By. cofactor and 
the resolvable helices to position the B;, domain during model building. Still, 
many side chains of the CFeSP B,2 domains lacked clear electron density, and 
thus for these residues, atoms were truncated past the CB atom (chain C, 50 
residues truncated and chain E, 59 residues truncated, out of 118 total residues). 

The B;» cofactor in the final model contains 5,6-dimethylbenzimidazole as the 
lower ligand moiety, as in cobalamin. Although active with cobalamin*’, previous 
studies have shown that CFeSP isolated from M. thermoacetica harbours an 
unusual B,, derivative that contains 5’-methoxybenzimidazole as the lower ligand 
instead**. However, disorder of the B, cofactor and B,2 domain owing to thermal 
motion of these regions in the CFeSP/MeTr crystal resulted in weak electron 
density for substituents of the benzimidazolyl ring (Supplementary Figs 18 and 
20). Therefore, we cannot confirm the presence of this unusual B,, derivative from 
our crystallographic studies, and we have thus modelled cobalamin as the form of 
B,, in the structure. 

Previous spectroscopic studies” in addition to the crystal structure of ChCFeSP” 
have indicated that a water molecule coordinates the central cobalt of B,, in the 
as-isolated CFeSP. Here, Co(II) is the major species and is expected to be five- 
coordinate. However, because of disorder we do not observe electron density to 
suggest a water molecule bound to cobalt (Supplementary Figs 18 and 20). 
Accordingly, we have not modelled a water molecule. 

Structure determination of folate-bound CFeSP/MeTr structures. For crystals 
grown with CH3-H,folate, X-ray diffraction data were collected at APS beam line 
24ID-C to 3.50 A resolution at A = 1.6039 A to optimize the cobalt peak anomalous 
signal. For crystals grown with both CH3-H,folate and Ti(III) citrate, X-ray dif- 
fraction data were collected at ALS beam line 8.2.2 to 3.03A resolution at 
A=1.0000 A. The structures were solved by molecular replacement using the 
MeTr homodimer and CFeSP heterodimers lacking Fe,S4 and B,, domains from 
the folate-free 2.38A CFeSP/MeTr structure as independent search models. 
Refinement of the folate-free CFeSP/MeTr structure against either folate-bound 
X-ray data set was not sufficient to solve the structure, as the unit cell dimensions 
were markedly different (Supplementary Table 1). After molecular replacement, 
one CFeSP/MeTr complex was present in the asymmetric unit, and omit electron 
density clearly indicated the presence of bound folate (Supplementary Fig. 21). 
Iterative rounds of refinement were performed in CNS*’ and PHENIX”, with 
model building in Coot**. The same test set of reflections for Rg. calculations 
was used for both folate-bound data sets. Data collection and refinement statistics 
are shown in Supplementary Table 1. Ramachandran analysis was performed in 
PROCHECK**: for the CH3-H,folate-only structure, 89.6% of residues reside in 
the most favoured region, with 9.8% additionally allowed, 0.3% generously allowed 
and 0.2% disfavoured. For the CH3-H,folate with Ti(IID citrate structure, 89.5% of 
residues reside in the most favoured region, with 10.0% additionally allowed, 0.2% 
generously allowed and 0.3% disfavoured. The final models both contain folate, B,» 
and residues 1-262 (of 262) for MeTr chains (A and B), residues 2-442 (of 446) for 
CFeSP large subunit chains (C and E) and residues 1-323 (of 323) for CFeSP small 
subunit chains (D and F). As with the folate-free structure, several side chains of the 
Fe,S4 domains for both folate-bound structures lacked clear electron density; thus 
for these residues, atoms were truncated past the CB atom (chain C, 15 residues 
truncated and chain E, 18 residues truncated, out of 52 total residues). Similarly, 
many side chains of the Bj, domains lacked electron density, and thus for these 
residues, atoms were truncated past the CB atom (chain C, 51 residues truncated 
and chain E, 78 residues truncated, out of 118 total residues). The liganded/oxida- 
tion states of folate and B,, in these structures were determined by use of a 
microspectrophotometer (see below). 

Solution and in crystallo ultraviolet-visible absorption spectroscopy to deter- 
mine enzyme activity in crystallo. Titanium(II]) citrate (100 mM in 50 mM Tris, 
pH7.6) was prepared”, and (6S)-5-methy1-5,6,7,8-tetrahydrofolate (CH3-H,folate) 
containing one glutamate tail was purchased from Schircks Laboratories. As- 
isolated, reduced and methylated CFeSP samples in solution were prepared in a 
room-temperature anaerobic chamber (MBraun) following similar procedures to 
those previously described'*"’***”°’, Briefly, purified CFeSP (20 1M) was used 
for the as-isolated sample, CFeSP mixed with Ti(III) citrate (1 mM) was used for 


©2012 Macmillan Publishers Limited. All rights reserved 


the reduced sample, and CFeSP mixed with equimolar MeTr, Ti(III) citrate 
(1 mM), and CH3-H,folate (1 mM) was used for the methylated sample. Spectra 
were taken using a Nanodrop 2000c (Thermo Scientific) in a quartz cuvette or on 
the sample stage in the anaerobic chamber directly after mixing; identical solutions 
lacking CFeSP were used as blanks. 

To obtain in crystallo absorption spectra, CFeSP/MeTr crystals in as-isolated, 
reduced and methylated forms were prepared in a similar fashion. In a room- 
temperature anaerobic chamber (Coy Laboratories), crystals were looped into a 
2 ul drop, which was placed on a cover slide and contained one of the following 
three solutions for as-isolated, reduced and methylated samples, respectively: well 
solution, well solution with Ti(III) citrate (10 mM) and well solution with Ti(IID) 
citrate (10 mM) and CH;-Hy,folate (1 mM). A ring of epoxy surrounding each 
drop was applied to the cover side, and a second cover slide was placed on top, 
sandwiching the drops within a uniform distance separation and sealing the 
crystals within an anaerobic environment. Upon curing of the epoxy, crystals were 
brought out of the anaerobic chamber and mounted on an XZ translation stage 
(Newport, UMR8.25 and SM-13) ina fibre optic coupled microspectrophotometer 
(Ocean Optics, Jaz) with 40 mm diameter reflective objectives (Optique Peter) and 
a deuterium-halogen lamp (DH2000-BAL, Ocean Optics), similar to that previ- 
ously described**””. Stray light was blocked with blackout material. The light focus 
was coarsely aligned to the crystals by visual inspection and then finely aligned by 
monitoring light transmission in real time. Data were acquired at room temper- 
ature with the SpectraSuite software (Ocean Optics). The background transmis- 
sion was measured through the solution immediately surrounding the crystals. 
The dark current was measured with the light shuttered off. Sample, reference and 
dark current spectra were acquired by averaging 10-50 scans with total exposure 
times of 90-1000 ms. Experiments were completed within 60 min of sample pre- 
paration, and crystals remained intact over the course of the experiment, as 
observed using a microscope after data collection. 

To generate Fig. 3, absorbance spectra were scaled relative to each other to 
account for variable crystal sizes and path lengths, where absolute peak absor- 
bances did not exceed one absorbance unit. 

In crystallo ultraviolet-visible absorption spectroscopy on folate-bound 
CFeSP/MeTr crystals to determine liganded/oxidation state of bound B,2 
and folate. Ultraviolet-visible absorption spectra were collected on a micro- 
spectrophotometer at 100K for crystals of folate-free CFeSP/MeTr, crystals that 
were grown in the presence of CH3-H,folate only and for crystals grown in the 
presence of both CH3-H,folate and Ti(III) citrate (Supplementary Fig. 9). The 
spectra were compared with the analogous solution spectra (Fig. 3). The spectrum 
for the folate-free crystal was similar to the spectrum of CFeSP alone in solution, 
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with broad features at approximately 400 and 470 nm indicative of the Fe,S, 
cluster and Bj, primarily in the Co(II) state. The spectrum for the crystal grown 
with CH3-H,folate matched the spectrum of folate-free crystals, indicating that 
By> had remained primarily in the Co(II) state, and turnover had not occurred. 
However, the spectrum for the crystal grown with both CH3-H,folate and Ti(III) 
citrate was markedly different and contained a peak at 450 nm, indicating that B,» 
in these crystals was methylated to the CH3-Co(III) state. Based on these data, we 
modelled the methyl group on folate in the structure co-crystallized with CH3- 
H,folate only, whereas we modelled the methyl group bound to Co of B,> for the 
CH3-Hy,folate/Ti(III) citrate structure. Without these spectroscopic data, assign- 
ment of the location of the methyl group would otherwise have been prevented by 
the resolution limits of the data. 
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THE INTERACTION MAP 


As increasing numbers of protein-protein interactions are identified, researchers are 
finding ways to interrogate these data and understand the interactions in a relevant context. 
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Multiple replicated experiments and sophisticated statistics reveal 497 interactions between 16 HIV proteins (blue) and hundreds of human factors. 


BY MONYA BAKER 

round the time that scientists cel- 
Aw the completion of the draft 

sequence of the human genome, 
papers from two separate groups described 
results of another project that tested all the 
possible pairings of thousands of yeast pro- 
teins to see whether they interact’. 

The importance of protein-protein 
interactions is beyond dispute. Little hap- 
pens in a cell without one protein ‘touch- 
ing’ another. Whether a cell divides, secretes 
a hormone or triggers its own death, 
protein-protein interactions make the event 
happen. Consequently, comprehensive maps 


showing which proteins came together in a 
yeast cell were much anticipated. 

But the results took scientists aback. 
Although the two research groups had 
explored the full collection of proteins in the 
same organism using the same yeast two- 
hybrid (Y2H) assay, the two papers found 
fewer than 150 interactions in common — 
only 10% of the findings that either team 
dubbed high quality. Most scientists regarded 
the results as so riddled with artefacts that 
they were useless. 

“As you can imagine, people were extremely 
critical. They just couldn't believe that you 
would get such different results when you 
were studying the same thing,” recalls Peter 
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Uetz, who studies protein interactions at the 
Center for the Study of Biological Complexity 
at Virginia Commonwealth University in 
Richmond, and was a co-author on one of the 
papers’. Even today, many researchers look 
askance at the Y2H assays used in the studies. 
But Marc Vidal, a systems biologist at the 
Dana-Farber Cancer Institute in Boston, Mas- 
sachusetts, says that the technique has come a 
long way ina decade. Not only have research- 
ers found ways to recognize and reduce 
false-positives, but gruelling follow-up studies 
show that the startlingly low overlap between 
the two reports was not because the assays 
found so many interactions that do not exist, 
but because they missed so many that do*. > 
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» Understanding these interactions is as 
important as ever. Protein interactomes — 
maps of protein interactions — are raw fuel 
for systems biologists. Promising techniques 
to block protein-protein interactions in can- 
cer cells and for other diseases have launched 
a string of biotechnology deals. Considering 
disease in terms of protein-protein interac- 
tions rather than individual genes and proteins 
could help to untangle jumbled observations. 
For example, mutations in the same protein 
could lead to different diseases by disrupting 
different interactions. Similarly, mutations in 
different proteins that disrupt the same inter- 
action could lead to the same disease. 

A good reference map of interactions 
would be like completing the human genome 
sequence, says Vidal, and could spawn further 
efforts to study genetic variation and function. 
A validated network would give scientists a 
jumping off point for more experiments. “My 
guess is that as these networks grow, we will 
get more elaborate ways of understanding 
where these interactions take place, when and 
why,’ he says. “We are getting a sense of a cell’s 
organizational self by doing this.” 


SCREENING SYSTEMS 

First described in 1989, the Y2H assay tests 
the interactivity of pairs of proteins by attach- 
ing them to two halves of a transcription 
factor’. If the proteins come together, the 
transcription factor is reformed, activating 
reporter genes and allowing the yeast to grow. 
Companies including Hybrigenics in Paris 
and Dualsystems Biotech in Zurich, Switzer- 
land, run Y2H asa service. 

“Yeast two-hybrid has an enormous advan- 
tage, which might also be a disadvantage: it 
can detect low affinity,” says Erich Wanker, 
a neuroproteomics researcher at the Max 
Delbriick Center for Molecular Medicine in 
Berlin, and co-editor of a book on the topic’. 
In other words, the assay can identify weak, 
transient pairings such as those that perpetu- 
ate cell signalling. But it also detects proteins 
that randomly bump together. This bumping 
has led to almost philosophical discussions. 
“At what point do we really believe that it’s an 
interaction?” asks Wanker. 

Scientists have also found ways to detect 
and avoid many sorts of false positive. Arte- 
facts from ‘sticky’ proteins, which bind non- 
specifically to other proteins, can be identified 
and excluded. Growth that is promoted by 
a single introduced protein rather than a 
reformed transcription factor can also be 
recognized. 

Precise systems also exist to make sure that 
all desired combinations are tested. Rather 
than transfecting the same yeast cell with 
genes for two potential interaction partners, 
yeast are transfected with individual genes, 
mated in pools and their progeny assayed for 
growth. Robotic systems mix yeast precisely 
and run multiple replicates of each assay. The 


number of times that the same interaction is 
seen becomes part of a quality score. “Our view 
is that Y2H can give reliable and reproducible 
results,’ says Wanker. 

Still, some interactions will not be 
observed in Y2H. For example, the interact- 
ing proteins have to allow the two halves of 
the transcription factor to reunite, and the 
proteins must be able to reach the nucleus to 
activate the reporter gene. Thus, interactions 


Interaction maps can help to explain protein 
function and identify new ways to fight disease. 


with membrane- or organelle-specific pro- 
teins are invisible. 

Besides Y2H, lower-throughput tests 
in mammalian cells can be used to screen 
interactions; these tests include luminescence- 
based mammalian interactome (LUMIER), 
mammalian protein-protein interaction 
trap (MAPPIT), protein arrays and protein- 
fragment complementation assay (PCA). 
Although these are orders of magnitude 
slower than Y2H, they can probe interactions 
in a more relevant context. 

MAPPIT is one of the highest-through- 
put mammalian screens. Instead of a yeast 
transcription factor, a mammalian cytokine 
receptor is split and becomes capable of cell 
signalling only when reconstituted. In 2009, 
Jan Tavernier, a network biologist at VIB, 
a life-sciences research institute in Ghent, 
Belgium, described a higher throughput ver- 
sion of MAPPIT in which plasmids encoding 
potential interaction partners linked to one 
cytokine receptor fragment can be indi- 
vidually spotted into wells and stored®. To 
begin the experiment, wells are filled with 
cells expressing the cytokine-receptor frag- 
ment linked with the selected ‘bait. When 
interactions occur, signalling activates the 
light-emitting enzyme luciferase. 

Using multiwell plates it costs about €2,000 
(US$2,600) to screen one bait protein against 
the human ORFeome (a complete set of cloned 
protein-encoding open reading frames), says 
Tavernier, who hopes to describe techniques 
to run MAPPIT on microchips later this year. 
Miniaturized assays should reduce the cost 
to €100 and allow the ORFeome to be tested 
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against 100 baits a week. 

At this throughput and cost, Tavernier 
says, new kinds of experiments become fea- 
sible. Instead of restricting screens to yeast 
cells, “you start mapping full interactomes in 
the appropriate species’, he says. In addition, 
Tavernier plans to compare how interactomes 
change when cells are treated with agents such 
as drugs or toxic chemicals. He is hoping to 
commercialize the technology, and is working 
with Vidal and other scientists to map human 
protein interactions using both MAPPIT and 
Y2H assays. 

LUMIER assays are also relatively high- 
throughput and can be used to test whether 
particular interactions are affected by drugs, 
hormones or other additives. For these assays, 
cells are transiently transfected with two pro- 
teins. One protein is attached to a hydrophilic 
peptide called FLAG. Potential interaction 
partners are linked with luciferase. Cells are 
lysed, the FLAG-tagged proteins are captured 
and the presence of the interacting partners 
can be detected by the light they give off’. 

Protein-fragment complementation assays, 
which can be conducted in yeast as well as 
mammalian cells, rely on reconstituting a 
wide range of ‘reporters, often enzymes or 
fluorescent proteins. Since the reporters can 
signal throughout the cell, interactions can be 
detected where they naturally occur. 

In a collection of articles published in 
January 2009, Vidal, Wanker and others 
described what Vidal terms an empirical 
framework for assessing protein interac- 
tions found in high-throughput screens’. In 
practice, this means repeating experiments 
using different types of assay and comparing 
the results with sets of controls. The positive 
controls are a refer- 
ence set of about 
100 well-established 
interactions care- 
fully selected from 
the literature. The 
negative controls are 
some 100 randomly 
assigned pairs that 
have never been 
observed together. 
Conditions of the 


“We are getting assays are adjusted 
asense of a cell’S to boost detection 
organizational of positive controls 
self by having without raising the 
avalidation detection of random 
network” interactions. 

Marc Vidal As part of a frame- 


work put forth in 
Nature Methods*, results from interaction 
studies should be confirmed in different 
types of assays. The more methods that find 
an interaction, the more confident researchers 
can be. Still, collectively, these assays detect 
only about 70% of the positive reference set 
(see ‘Beyond binary interactions’). 
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Yeast two-hybrid assays can probe hundreds of thousands of potential protein interaction pairs a week. 


High-throughput experiments are not 
the only way to identify protein-protein 
interactions. Several databases, such as the 
Biological General Repository for Inter- 
action Datasets (see thebiogrid.org) and 
IntAct (see www.ebi.ac.uk/intact), compile 
lists of interactions as they are published in 
the literature, culling from both small-scale 


and high-throughput experiments as well as 
predicted interactions inferred from other 
analyses. But this list is not even close to 
complete, says Sandra Orchard, a proteomics 
service coordinator at the European Bioinfor- 
matics Institute in Hinxton, UK, who helped 
to develop minimal information standards to 
help share and evaluate interaction data. “We 


Beyond binary interactions 


Rather than individual proteins, protein 
complexes are the functioning biochemical 
entities in a cell, says Nevan Krogan, a 
systems biologist at the University of 
California, San Francisco. “If you think about 
a protein, more often than not itis in a protein 
complex.” If a complex requires multiple 
components to form, two-hybrid studies 
cannot be expected to detect it. 

Instead, complexes are generally studied 
in ‘pull down’ assays. The gene for a protein 
of interest is fused to a peptide tag that 
allows it to be ‘fished’ from cell lysates; less 
commonly, antibodies are used to purify out 
unlabelled proteins. The captured protein 
pulls its associates with it, and these are 
identified, usually by mass spectrometry. 
Companies such as Agilent Technologies 
in Santa Clara, California; Cell Signaling 
Technology in Danvers, Massachusetts; and 
IBA in Gottingen, Germany, offer transfection 
vectors and capture technologies. 

Typically, researchers tag one protein 
of interest in each experiment, and these 
are then captured on a column. If not 
enough protein is captured, researchers 
can use experiments that add further tags 
and capture steps. If a complex is fragile, 
crosslinking reagents can be added to cells, 
binding nearby proteins together. 

Although transient and fragile components 
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of complexes are hard to detect, the biggest 
problem in pull down assays is background 
— proteins that are not part of a complex 
but get pulled along. Researchers are 
getting much better at telling which proteins 
observed with a tagged protein are actually 
part of a complex, says Anne-Claude Gavin, 
who studies protein complexes at the 
European Molecular Biology Laboratory in 
Heidelberg, Germany. 

The secret is statistics. If a protein complex 
has ten components, there should be ten 
ways to pull down the complex, explains 
Gavin. In a large data set involving many 
tagged proteins, she says, the same protein 
should occur in three forms: as the tagged 
protein, as an interaction partner and as 
background. Several scoring systems are 
used to sort artefacts from real interactions 
and to identify the components of protein 
complexes. The key, says Krogan, is to collect 
enough data. “For these scoring systems to 
be effective, you need pull downs of many 
types of proteins.” 

Krogan and his colleagues!” recently used 
pull down experiments to identify complexes 
formed between HIV proteins and host 
proteins. Disrupting these interactions could 
prevent the virus from entering cells. They 
selected 18 HIV proteins and tagged them 
in two ways, using FLAG and Strep tags. The 
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will be lucky if as much as 30% of the yeast 
interactome has been observed,’ she says. For 
the human interactome, she estimates that the 
figure is less than 10%, including published 
results that are not captured in the databases. 


WHEN TO BELIEVE 

Biologists rely on interaction data in several 
ways. They often layer protein-protein inter- 
action networks onto other networks. After 
identifying transcription factors that regulate 
a gene, for example, they search databases and 
literature for transcription factors’ interaction 
partners. Researchers also explore how sets of 
proteins are connected to each other, and then 
ask questions based on the structure of the 
network, such as classifying the proteins that 
have the most interaction partners. But not all 
interaction data are equal, warns Russell Finley, 
a network biologist at Wayne State University 
School of Medicine in Detroit, Michigan, who 
believes that incorporating quality measures 
could make the data substantially more power- 
ful. At present, he says, savvy researchers filter 
out interactions unless they have been observed 
more than once through different methods, 
but these ‘intuitive filters’ can be biased. For 
example, the more often a protein is studied, 


team developed a scoring system called 
mass spectrometry interaction statistics to 
sort the interactions. The system compiles a 
single score on the basis of abundance (more 
abundant proteins are more likely to be 
background), reproducibility and specificity. 
Almost 500 proteins reached this threshold; 
only 19 of these had already been reported’. 

This system vastly increased the number 
of potential targets for drugs that could treat 
HIV, and revealed a way to perform further 
studies on a previously intractable HIV 
protein — Vif, which was known to overcome 
human cells’ antiviral defences but was hard 
to study biochemically and structurally. The 
reason, says Krogan, was missing interaction 
partners. “You needed all the components of 
the complex there to make it behave.” 

Now, researchers will be able to find 
potential ways to disrupt the complex, 
which could lead to new anti-HIV drugs. 
The real challenge, says Krogan, is to 
continue to mine the identified interaction 
partners for biological meaning: low- 
throughput work that requires extensive 
follow-up study. “The main goal of making 
these maps is not generating these 
maps. It is to extract biological insight, 
mechanism and testable hypotheses,” he 
says. “Sadly, work almost always stops 
at the maps.” WB. 


P. SONNABEND/MDC 


the more interactions will be found. Finley says 
that a better approach would be to consider all 
the data available and assign a score reflecting 
the likelihood that an interaction is real. Com- 
puter analyses could then be used to consider 
more interactions, giving more weight to those 
with higher confidence scores. 

But an interaction can occur and have no 
actual consequences. “The real question is 
what interactions have meaning in the first 
place,’ says Stephen Michnick, a biochemist at 
the University of Montreal, Canada. “An inter- 
action can be quite good, that is, reproducible 
in multiple assays, but not be biologically 
important.” In other words, the interaction 
has no discernible effects: it does not start or 
stop a molecular machine, activate an enzyme 
or send another protein to destruction. 

Michnick came to these conclusions after 
conducting a comprehensive study that 
allowed protein interactions to be studied in a 
more natural context. In the protein-fragment 
complementation assays, interacting proteins 
reconstituted an enzyme that yeast needed to 
survive under culture conditions’. This identi- 
fied about 3,000 new interactions, with many 
involving membrane and other proteins that 
cannot reach the cell nucleus. 

But thousands of other protein interactions 
were observed with less confidence. “We were 
surprised that there were known proteins that 
made too many interactions or made interac- 
tions that didn’t make biological sense,’ Mich- 
nick recalls. “We thought we had the perfect 
method, and so we would get perfect results.” 
“So we thought, if we are seeing junk interac- 
tions and other people are seeing junk, what 
is the junk?” The answer, he believes, is that 
these are naturally occurring ‘junk interac- 
tions that, like sections of DNA that do not 
seem to have a function, simply exist. 

Michnick believes that perhaps as many 
as half of the interactions observed even in 
rigorous screens have no biological function. 
Abundant proteins should be treated with 
particular scepticism, but if the same pairs of 
proteins are consistently found together and 
not with other proteins then that interaction 
is more likely to be real, and the same is true of 
interactions identified across multiple species. 
“The parts that are functional have to be dis- 
sected from the rest of what’s there?” Michnick 
says. 

Trey Ideker, a network biologist at the 
University of California, San Diego, is more 
worried that such a small percentage has 
been observed at all. “It’s not clear how you 
can shortcut to the functional interactions 
without some unbiased way of getting all the 
interactions,” he says. “We have a flashlight 
illuminating 20% of the yard, but the other 
80% is dark.” In fact, no one yet knows how 
big the universe of interactions is, he says, “but 
everyone agrees that we are not even close to 
having mapped it”. 

Nonetheless, more interactions have been 


identified than can be individually investi- 
gated. For Ideker, the best approach is to think 
in terms of databases. “I have this big ‘gamisl’ 
of interactions, how do I best query it?” 


DATA COMBINATIONS 

One strategy is combining diverse data sets 
around focused questions. For example, Ideker 
decided to conduct a Y2H screen that would 
pull out interactions involved in the mitogen- 
activated protein kinase (MAPK) signalling 
cascade — an important drug target that 
regulates processes such as cell growth, differ- 
entiation and survival. Ideker and his colleagues 
picked 150 proteins associated with the path- 
way and hunted for their interaction partners 
using Y2H assays. This revealed more than 
2,000 interactions among about 1,500 proteins. 

From these they selected a dozen or so pro- 
teins that had not previously been associated 
with the MAPK cascade and used RNA inter- 
ference to knock down the expression of the 
identified interaction partners. In about one- 
third of the cases, RNA knockdown altered gene 
expression within the cascade, indicating that 
these interactions were functional. Follow-up 
studies provided the first experimental evidence 
that a protein called NHE-1 served as a MAPK 
scaffold’®. 

By starting with 
the interactions and 
whittling them away 
with other data, 
the researchers can 
uncover new biol- 
ogy, says Ideker. “It’s 
the superposition 
of biophysical and 
functional data that 
is really going to save 


“ 

It’s the ar the day here.” 
superposition of Researchers can 
biophysical and also glean insight 
functional data from how proteins 
that is going to interact physically. 
save the day.” This year, Haiyuan 
Trey Ideker Yu and his colleagues 

at Cornell University, 


Ithaca, New York, showed how combining data 
about protein-protein interactions and protein 
structure could suggest how certain mutations 
cause disease". 

They combined several established data sets 
of protein-protein interactions, the physical 
structure of those interactions, and genetic 
measurements to show that when mutations do 
not prevent proteins from being expressed but 
still cause disease, they are more likely to occur 
in the interface between interacting proteins 
than elsewhere. “For the past decade, biologists 
have been using this mathematical definition. 
Every protein is a mathematical dot. But we 
know that protein structure is fundamentally 
important for function,’ says Yu. 

Information about whether an interaction 
occurs in a specific cell type or under certain 
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conditions could go a long way to revealing its 
function, says Anne-Claude Gavin, who studies 
protein complexes at the European Molecular 
Biology Laboratory in Heidelberg, Germany. 
“Interactions have to be context-dependent; 
they have to start at one time and stop at 
another.’ But these studies are difficult and are 
rarely done. “This is a level of sophistication 
that we just don't understand,’ she says. 

To understand a protein-protein interaction 
in context, researchers need to single them out 
for focused studies. 

Sometimes, screening techniques can be 
adapted to follow particular interactions in 
depth. For example, complementation assays 
with fluorescent proteins or luciferase can be 
used to follow interacting proteins. Because 
different coloured fluorescent proteins are so 
similar, one protein can be tested for interac- 
tions with two or more proteins in the same cell. 
One protein is labelled with a fragment of yellow 
fluorescent protein, a second with a fragment 
of cyan fluorescent protein and another inter- 
rogated protein carries a fragment common to 
both fluorescent proteins. This can show which 
protein interactions are occurring and where 
in the cell they occur. Complementation assays 
with luciferase can also be used with multiple 
colours of proteins and have the advantage that 
the enzyme easily breaks apart and reforms, 
allowing researchers to study how interactions 
can be disrupted. Imaging techniques such as 
bioluminescent resonance energy transfer and 
fluorescent resonance energy transfer can be 
used in living cells. They use genetically tagged 
proteins that emit light when proteins come 
into contact with each other, and so are used in 
a variety of assays. Other assays label each of two 
proteins and then monitor whether they move 
together in cells. 

Although slower and more expensive than 
large-scale screening efforts, one-at-a-time 
explorations of interactions are essential, says 
Uetz. “Eventually you want to drill down into 
the actual interactions. m 


Monya Baker is technology editor for Nature 
and Nature Methods. 
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INNOVATION 


The big idea 
of technology 


transfer 


Working at the interface between science and business 
offers an opportunity to bring ideas to market. 


BY CHARLOTTE SCHUBERT 


s Angela Loihl worked her way through 
A= graduate research project at the 

University of Iowa in Iowa City, she 
noticed the scope of her studies getting nar- 
rower. “I spent all my time learning more 
and more about less and less,” she says of her 
research in mice, which focused on a protein 
thought to have a role in stroke. “I questioned 


how relevant that was to the human condition” 

Fourteen years after she earned her PhD, her 
career is far broader. As a technology manager 
at the Center for Commercialization at the 
University of Washington in Seattle, she covers 
a wide range of life-science fields — from 
microbiology to radiology, with an occasional 
foray into chemical engineering. 

Loihl’s role is to take scientific ideas from 
academia and negotiate their transfer to 
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biotechnology start-ups and pharmaceutical 
companies, at which point the most promis- 
ing leads may become new therapies. Loihl 
is routinely presented with intriguing, and 
sometimes thorny, questions. What is the 
commercial potential ofa particular university 
technology? Which company will license the 
rights to which innovation? Are a researcher's 
findings strong enough to launch a spin-off? 
Loihl has helped to foster technology related 
to influenza treatments, tests to assess the risk 
of cardiovascular disease and a vaccine for the 
type 2 herpes simplex virus. “Every day I learn 
something new from the science and business 
perspective,’ she says. “I love this job” 
Working in technology transfer demands 
teamwork and the ability to assess a huge 
range of scientific areas. Those who thrive 
have the right mixture of extroversion, 
scientific breadth and business sense. They are 
also able to juggle multiple projects at once. 
The hardest thing, says Loihl, is keeping 
everyone’s expectations realistic, such as 
ensuring a researcher does not overestimate 
the commercial value of his or her work, or 
that a company does not underestimate it. “If 
you do something well by one stakeholder, it 
usually means you are upsetting another,’ says 
Loihl, adding that the variety of duties and 
challenges is what makes her job great. 


WHERE THE JOBS ARE 

Fourteen years ago, there were 26 employees at 
what was then called the office of technology 
transfer at the University of Washington. Now 
there are 54. The growth, which is not atypical 
for large academic institutions in the United 
States, is due in large part to the Bayh-Dole Act 
of 1980. The act changed the pace and man- 
ner of innovation in the United States, giving 
universities and not-for-profit organizations 
control over the intellectual-property rights 
of federally funded research done within the 
institution. The result was a huge increase in 
technology-transfer related opportunities, par- 
ticularly in the 1990s as universities mined this 
new source of revenue. 

The expanding offices have taken on func- 
tions beyond the usual roles of patenting and 
licensing technology, resulting in new types of 
work. At the Center for Commercialization, 
office head and entrepreneur Linden Rhoads 
leads a team of 15 technology managers (also 
known as licensing managers or licensing spe- 
cialists), which are typical positions for people 
making the jump from science, as Loihl did. 
Rhoads also employs specialists who team > 
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> up with the managers on various projects. 
“We have all kinds of people in support roles,’ 
she says. 

For instance, three industry-relations officers 
help the technology managers to understand 
industry needs and to forge collaborations 
between faculty members and companies. 
Most of these officers are scientists or engineers 
with extensive business experience. Four peo- 
ple, most of whom have business experience 
at small start-up companies, are in a ‘New 
Ventures’ group, which works with technology 
managers and researchers to start companies 
by inviting potential investors to the office, for 
example, and recruiting business people for 
an ‘entrepreneurs-in-residence’ programme. 
The New Ventures group also boasts a grant 
writer, who can pull in funding from sources 
such as the US Small Business Administration 
in Washington DC to foster a start-up. 

Patent agents in Rhoads’ team, some of 
whom have law degrees, write the lengthy 
and technical patent applications and han- 
dle the continual back-and-forth between 
the patent office and researchers to keep 
the application moving forward — a task 
that technology managers often perform 
in smaller offices. Loihl advises job seekers 
to “scrutinize the office to see what kind of 
resources are available”. 

Although Rhoads is still hiring staff, the 
big expansion in the number of US tech- 
nology-transfer jobs has slowed, says Robin 
Rasor, director of licensing at the University 
of Michigan in Ann Arbor and former presi- 
dent of the Association of University Tech- 
nology Managers in Deerfield, Illinois (see 
‘Growth subsides’). The association’s annual 
survey of technology-transfer offices at US 
universities, hospitals and research institu- 
tions showed a slight drop in the number of 


GROWTH SUBSIDES 


Number of licensing professionals at eight major 
US research universities shows a slight decrease. 
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full-time licensing jobs from 1,041 to 1,018 
between 2009 and 2010, at the 172 institutions 
that responded in both years. Job markets in 
Germany and the United Kingdom, which 
have long had laws similar to Bayh—Dole, 
are likewise slowing, says Anders Haugland, 
president of the Association of European 
Science and Technology Professionals, based 
in Leiden, the Netherlands. But ongoing 
changes in patent law and technology-transfer 
policies elsewhere have created job potential, 
notes Haugland, who also heads a technology- 
transfer office of seven institutions, including 
the University of Bergen and Haukeland 
University Hospital, in Norway. 

Norway adopted policies similar to the 
Bayh-Dole Act in 2003. The countries uni- 
versities now retain one-third of the pro- 
ceeds from intellectual property arising from 
research conducted on their campuses. Pre- 
viously, faculty members generally retained 
the full rights. Back then, Haugland’s tech- 
nology-transfer office did not even exist. It 
now has 23 employees, some of whom have 


JUMPING TO INDUSTRY 


Making the move to the private sector 


Working in a technology-transfer office 
does not mean staying in academia. Some 
use their technology-transfer experience to 
make the leap to the private sector. 

“We are at the other side of the deals,” 
says Polly Murphy, a vice-president at 
drug firm Pfizer in San Diego, California, 
who leads 30 people in the business 
development team for the company’s 
research and development branch. The 
team negotiates transactions with other 
companies, not-for-profit organizations, 
universities and other institutions. Most 
of the team came from the technology- 
transfer world, which, says Murphy, is the 
best place to learn how to draft licensing 
and collaboration agreements. Working in 
technology transfer can also lead to a job in 
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a contracts and outsourcing office, which 
manages agreements for clinical trials and 
other projects. But Murphy cautions that 
shrinkage in the industry has made such 
jobs scarce. “In my department nobody ever 
leaves,” she says. 

Other technology-transfer employees 
head to biotechnology firms or use their 
scientific knowledge and understanding of 
the market to work in a venture-capital firm. 
But biotechnology start-ups come and go, 
along with their licensing offices. Those who 
are risk averse should be wary of the jump 
to industry, says Anders Haugland, head of 
the Association of European Science and 
Technology Professionals in Leiden, the 
Netherlands. And universities and research 
institutions often have better benefits. CS. 
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come from as far as India, Croatia and Lithu- 
ania. “We are hiring people every year and we 
are growing,” says Haugland. 


GETTING THE JOB 

As Loihl became disillusioned with bench 
science, she began inquiring about internship 
opportunities at the then-small technology- 
transfer office at the University of Iowa, but they 
showed little interest. “They didn’t know how 
they would use me,’ she says. 

But times have changed. Technology-trans- 
fer offices now offer various paid and unpaid 
internship programmes. The University of 
Washington has a typical technology-licensing 
intern programme. Interns prepare summaries 
about the university researcher’s innovations 
to facilitate licensing negotiations, and help to 
evaluate whether a new technology can be pat- 
ented and whether there is a potential market 
for it. Often an internship can lead to a longer, 
salaried apprenticeship, says Rhoads. A technol- 
ogy-transfer role may also lead to a position in 
industry (see Jumping to industry’). 

An internship is the best way to get a foot in 
the door, especially for those making the transi- 
tion from science, says Rasor. She and Rhoads 
have both hired interns who came straight 
from postdoctoral fellowships or graduate pro- 
grammes as junior technology managers. 

Courses in business, marketing or law can 
help to land a job or internship. PraxisUnico, a 
not-for-profit organization in Cambridge, UK, 
offers a three-day ‘fundamentals of technology 
transfer’ course, which is popular for those new 
to the sector, says Alison Campbell, a consultant 
and chair of PraxisUnico’s training committee. 
Aspiring technology managers can learn the 
basics of patenting, trademarks and licensing, 
as well as marketing and negotiation tactics. 

Although not all of Rhoads’ team have a 
masters in business, she says that the degree 
can help with tasks such as assessing the mar- 
ket potential for a technology — for example, 
by researching the size of the market, cost of 
goods or pricing models. Rhoads herself has a 
law degree, which helps in understanding the 
complicated language of contracts. But in the 
end, it may be business experience that carries 
the most weight. “If you have someone who has 
any kind of experience in industry, that is a big 
plus;’ says Rasor. Experience in a business office 
is preferable, but even work as a researcher at 
biotechnology company can help, she says. 

There is no typical route into a technology- 
transfer role. And ultimately, less-tangible 
abilities, such as communication and people 
skills, could be what leads to success in the field. 
“People need to be good at having consultations 
with scientists, turning over rocks and intro- 
ducing researchers from industry to academic 
researchers,’ says Rhoads. Those who thrive in 
the job, she says, are “natural connectors”. = 


Charlotte Schubert is a freelance journalist 
based in Seattle, Washington. 
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COLUMN 


Rating research risk 


Too many young physicists embark on projects without knowing the risks. There is a 
better way, argues Abraham Loeb. 


by how well it agrees with experimental 

data. But how should the physics commu- 
nity gauge the value of an emerging theory 
that cannot yet be tested experimentally? With 
no reality check, a less than rigorous hypoth- 
esis such as string theory may linger on, even 
though physicists have been unable to work out 
its actual value in describing nature. 

This sort of uncertainty has implications 
not only for the gathering of knowledge for 
the scientific enterprise, but also for fledgling 
physicists. The investment of research time in 
strong intellectual assets is crucial for graduate 
students who want to establish their careers on a 
good foundation. But not all young researchers 
are aware of the history that accompanies every 
research area. They often have to rely on word 
of mouth from their PhD adviser or colleagues 
for this information. 

What if the physicists could call on a ratings 
agency, not unlike a lender would do before 
deciding whether to offer credit? I am advo- 
cating the creation of a website that is operated 
by graduate students and that will use various 
measures of publicly available data (such as the 
number of newly funded experiments, research 
grants, publications and faculty jobs) to gauge 
the future returns of various research frontiers. 


[: physics, the value of a theory is measured 


THEORY BUBBLES 
The study of the cosmic microwave background 
provides an example of how theory and data can 
generate opportunities for young scientists. As 
soon as NASAs Cosmic Background Explorer 
satellite reported conclusive evidence for the 
cosmic microwave background temperature 
fluctuations across the sky in 1992, the sub- 
sequent experimental work generated many 
opportunities for young theorists and observers 
who joined this field. By contrast, a hypothesis 
such as string theory, which attempts to unify 
quantum mechanics with Albert Einstein’s 
general theory of relativity, has so far not been 
tested critically by experimental data, even over 
atime span equivalent to a physicist’s career. 
Senior scientists might seem the people best 
suited to rate the promise of research frontiers. 
But too many of these physicists are already 
invested in evaluating the promise of these spec- 
ulative theories, implying that they could have a 
conflict of interest or be wishful thinkers. Hav- 
ing these senior scientists rate future promise 
would be akin to the AAA rating that financial 


ll 


agencies gave to the very debt securities from 
which they benefited. This unseemly situation 
contributed to the last recession, and a long- 
lived bias of this type in the physics world could 
lead to similarly devastating consequences — 
such as an extended period of intellectual stag- 
nation and a community of talented physicists 
investing time in research ventures unlikely to 
elucidate our understanding of nature — a the- 
ory ‘bubble; to borrow from the financial world. 

Of course, graduate students are busy. But 
they could serve a limited term of service for 
maintaining the site and be government- 
funded. For example, students supported by US 
National Science Foundation fellowships main- 
tain astrobites.com, a website that summarizes 
new astrophysics papers. 


CREDIT RATING 

The physics ‘credit-rating’ website would use 
evaluation metrics to factor in, with the cor- 
rect weighting, all the ingredients that would 
ultimately make scientific research successful. 

For physics, this might include the existence 
of an underlying self-contained theory from 
first principles, the potential for experimental 
tests of this theory and a track record of related 
research programmes. Clearly, factors such as 
intellectual excitement cannot be quantified, but 
as long as funding agencies are supporting pro- 
jects and the information provided is accurate, 
the data about the growth ofa field should echo 
this ‘excitement factor. 

The evaluation metric would have to be pre- 
determined and supported by numbers that are 
based on archival data gathered through auto- 
mated searches for keywords in electronic data 
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archives (see arxiv.org or nsf.org). Aside from ¢ 
automated searches, practitioners from fields that z 
are being evaluated could submit supplementary 
data that would be incorporated into the analysis. 

The entire data set would include the level of 
funding allocated to experiments and research 
grants, the status of the underlying theory and 
the number of publications and faculty jobs 
within the particular field of research. The 
simplest model relates the change in these 
parameters to a linear combination of their 
values. For example, the publication rate is 
expected to relate to a linear combination of 
the number of experiments, faculty jobs and 
available research funds. With the right mix of 
time spent on theory, experimental work and 
grant support, a research frontier would show 
exponential growth in this linear model. The 
next step would be to calibrate this model using 
historical data about the growth of successful 
research frontiers. 

The website could be helpful to institu- 
tions and governments, not just to individual 
scientists. A balanced assessment of the level 
of risk and potential benefits from emerging 
research frontiers can increase the efficiency 
of the workforce, leading to stronger growth. 
And it could help funding agencies to optimize 
their allocation of money to promote progress 
in research. In fact, it would be in the interests 
of funding agencies to support the website and 
help the students to take part (for example, 
through special grants or fellowships). 

The website might also convince senior 
researchers to shift their focus to new research 
areas, perhaps asa result of the influence that the 
rating procedure may have on funding agencies. 
But maintaining balance and ensuring diversity 
among subfields, taking some risks and avoid- 
ing funnelling resources into a small number 
of successful but conservative programmes are 
important considerations for funding agencies 
(A. Loeb Nature 467, 358; 2010). 

Nearly every worthwhile endeavour involves 
some risk. But mitigating that risk, and helping 
young scientists to make informed decisions 
about the field in which they should invest their 
time and intellect, would yield a more efficient 
scientific enterprise. a 


Abraham Loeb is chair of the astronomy 
department and director of the Institute 

for Theory and Computation at Harvard 
University in Cambridge, Massachusetts. 
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Uta SCIENCE FICTION 


NOR CUSTOM STALE 


BY ANATOLY BELILOVSKY 


Id people move slowly. A knee may 
C) ss just fine today and buckle 

tomorrow. A tiny turn can change 
a functioning joint into a monolithic block 
of agony. 

Old people need to remember things like 
that. 

“You look great,” said Bob, looking 
down and chewing at his dentures. 

Bob had been a good striker, back in 
the day, but he never learned to fool goal- 
ies. He always looked where he wanted 
the ball to go. So in the end, I went pro, 
and Bob went into insurance. 

“Really good,” Bob added. His feelings 
littered his face the way his lunch deco- 
rated his tie: a forced tight, guilty smile 
with pity in the tilt of his head, and a tiny 
trace of Schadenfreude in the crow’s feet 
around his eyes. 

I knew how 1 looked. I looked terrible: 
slumped shoulders, shuffling gait, shabby 
jacket, stained pants. 

“Thanks, I whispered. “Look at us. 
Sixtieth reunion. Whod’a thunk wed live 
this long, eh, roomie?” 

Bob looked up, his smile genuine now. 
“Athletes don't age as fast,” he said. “T still 
play nine holes, every week. Like clock- 
work. Keeps me young.” 

I nodded. “It's working,’ I whispered. 
“You haven't changed a bit” 

He barked out a single laugh. “Right. 
Not a bit,’ he said. Leaning back, he turned 
his head slowly through a short arc, sweep- 
ing his gaze over the far side of the quad. 
“Princeton sure has changed,’ he continued. 
“There's dorms where soccer fields used to be. 
Where you and I played. Mixed dorms! Not 
just co-ed. Mixed. Ain't that something!” He 
slapped his knee, winced slightly. 

I nodded again. 

“Trouble with your voice?” he asked. 

“Sort of? I whispered. 

He sighed. “Joanie had a stroke, she can’t 
talk at all? he said. “And Todd died last year 
of throat cancer, he had a tracheostomy. Had 
to plug the hole in his neck when he wanted 
to say anything” 

I sighed, too, and lowered my eyes. Joanie 
and I lived together, sophomore year. I’d 
never fool her for a minute. 

“I worked as long as I could, only retired 
after I got my bypass,” he said. His face took 
on that look again: guilt, pity and a dash of 
gloat. Same look he had when Id told him 
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Anantic disposition. 


Joanie had left me. Minus the dentures and 
the hand tremor. “Joanie gets round-the- 
clock nursing. Government pays for every- 
thing,” he added. “Wouldn’t wanna get old 
in acountry where you pay for medicine” 

Joanie moved in with him a week later. 
Maybe that was why I went overseas, so Id 
never have to see that look again. In either 
of their faces. 


Bob reached for my shoulder, his fingers 
trembling as if rolling an invisible cigarette. 
I fought the urge to move away. 

“T heard health care is expensive where 
you live,” he said. 

I nodded, with far more force than I 
expected. 

Bob looked up sharply. 

I winced and rubbed my neck a fraction 
of a second later. Bob leaned closer to me. 
His shiny dentures clashed with his cracked, 
spit-flecked lips; his eyes, once brown, were 
now ochre iris on yellow sclera. 

“You could get your citizenship back,” he 
said, barely moving his lips. “My grandson is 
adamn good immigration lawyer’ 

I shrugged. 

“Think about it,’ he said. 

“T will? I said. “But 


DNATURECOM == now [have to go. Don't 
FollowFutureson = want to miss my flight? 
Facebook at: “Stay with me,” he 
go.nature.com/mtoodm said quickly. “Weve 


© 2012 Macmillan Publishers Limited. All rights reserved 


guest rooms at the home. We can go see 
Joanie tomorrow.’ 

I counted to five and held my breath: an 
old voice actor’s trick. 

“No, I said. 

The word came out as I intended: with 
longing. With reluctance. With regret. 

Bob shook my hand and hobbled away as 
quickly as he could. 


[had to sprint across the terminal to reach 

my gate on time. People stared. My wrin- 

kles itched; I detoured to the lavatory 

to peel them off. The TranSec agent 

looked at me with a suspicious squint. 

I called the Farm from the plane 

while it waited for clearance. Gulnara 
answered. 

“Well, hello, stranger,” she purred. “Do 
we have a date?” 

“Sure,” I said. “When can you fit me 
in?” 

“You want a quickie,’ she said, “or the 
whole jalapefio?” 

“Enchilada,” I said. Gulnara’s English 
was perfect. It was American she had 
trouble with. “I want the whole enchilada. 
It's been a while” 

“So I see.’ She paused; I heard keys tap- 
ping. “I have an opening Wednesday. Is 
this good?” 

“Sure,” I said. “Training camp doesn’t 
start for another fortnight.” 

I heard tapping again. “Excellent. Full 
rejuvenation, a five-day course starting 

Wednesday, shall I debit your fee now?” 
“Go ahead,” I said. “Did it go through?” 
“With your credit rating?” she said. “Of 

course it went through.” She paused. “I’m so 
glad you haven't retired. Watching you play 

— it never gets old. It’s like, you are not just 

playing soccer, but also poker and chess at 
the same time. Does this make sense?” 
“Sweetheart; I said, “I can't afford to retire.” 
Her answer drowned in the turbine spin- 
up. I disconnected my phone, leaned back, 
turned on the viewscreen. The plane made 

a climbing turn above central Jersey before 

heading over the Atlantic. Somewhere 
below, shabby, weed-choked Princeton swel- 

tered in the heat, and Bob shambled with a 

cane to the train that would take him to his 
nursing home. 
Pity about poor Bob. = 


Anatoly Belilovsky learned English from 
Star Trek reruns and is now a paediatrician 
in New York. 
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The clonal and mutational evolution spectrum of 
primary triple-negative breast cancers 


Sohrab P. Shah?, Andrew Roth!?*, Rodrigo Goya**, Arusha Oloumi!?*, Gavin Hal?*, Yongjun Zhao**, Gulisa Turashvili?*, 
Jiarui Ding?*, Kane Tse**, Gholamreza Haffari!**, Ali Bashashati+?*, Leah M. Prentice!?, Jaswinder Khattra’, Angela Burleigh'”, 
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Primary triple-negative breast cancers (TNBCs), a tumour type 
defined by lack of oestrogen receptor, progesterone receptor and 
ERBB2 gene amplification, represent approximately 16% of all 
breast cancers’. Here we show in 104 TNBC cases that at the time 
of diagnosis these cancers exhibit a wide and continuous spectrum of 
genomic evolution, with some having only a handful of coding 
somatic aberrations in a few pathways, whereas others contain 
hundreds of coding somatic mutations. High-throughput RNA 
sequencing (RNA-seq) revealed that only approximately 36% of 
mutations are expressed. Using deep re-sequencing measurements 
of allelic abundance for 2,414 somatic mutations, we determine for 
the first time—to our knowledge—in an epithelial tumour subtype, 
the relative abundance of clonal frequencies among cases represent- 
ative of the population. We show that TNBCs vary widely in their 
clonal frequencies at the time of diagnosis, with the basal subtype of 
TNBC’* showing more variation than non-basal TNBC. Although 
p53 (also known as TP53), PIK3CA and PTEN somatic mutations 
seem to be clonally dominant compared to other genes, in some 
tumours their clonal frequencies are incompatible with founder 
status. Mutations in cytoskeletal, cell shape and motility proteins 
occurred at lower clonal frequencies, suggesting that they occurred 
later during tumour progression. Taken together, our results show 
that understanding the biology and therapeutic responses of patients 
with TNBC will require the determination of individual tumour 
clonal genotypes. 

To understand the patterns of somatic mutation in TNBC, we 
enumerated genome aberrations at all scales from 104 cases of primary 
TNBC (Affymetrix SNP6.0, 104 cases; RNA-seq, 80 cases; genome/ 
exome sequencing, 65 cases: 54 exomes, 15 genomes with 4 overlap- 
ping) (Supplementary Table 1 and Supplementary Fig. 1), annotated 
with clinical information (Supplementary Table 2). We revalidated 
2,414 somatic single nucleotide variants*? (SNVs) (Supplemen- 
tary Table 3) with targeted deep sequencing to a median of 20,000 


coverage, including 43 non-coding splice site dinucleotide mutations 
(Supplementary Table 4) and 104 genes with 107 indels (Supplemen- 
tary Table 5 and Supplementary Methods). Notably, the distribution of 
somatic mutation abundance varies in a continuous distribution 
among tumours (Fig. 1a) and seems to be unrelated to the proportion 
of the genome altered by copy number alterations (CNAs) (Fig. 1b) or 
tumour cellularity (Supplementary Fig. 2b). Although this distribution 
could be partially explained by a false-negative rate in mutation dis- 
covery, others have noted similar distributions in epithelial cancers’, 
suggesting that the total mutation content of individual tumours may 
be shaped by biological processes or differential exposure to mutagenic 
influences in the population. 

The overall pattern (Supplementary Fig. 3a, b) of CNA abundance 
appears similar (Supplementary Fig. 4) to that seen in a larger, inde- 
pendent series of ~2,000 SNP6.0 profiled breast tumours’. Among the 
most frequently observed CNA events (Supplementary Table 6) are the 
tumour suppressor and oncogenes PARK2 (6%), RB1 (5%), PTEN 
(3%) and EGFR (5%). Here we report intragenic deletions (Sup- 
plementary Fig. 5) in the PARK2 tumour suppressor*”, specifically 
linking PARK2 with TNBC for the first time. Consistent with previous 
reports in breast cancer’’, we did not observe frequent recurrent struc- 
tural rearrangements (Supplementary Fig. 3d and Supplementary 
Table 7), although we revalidated many individual fusion events invol- 
ving known oncogenes or tumour suppressors (for example, KRAS, 
RB1, IDH1, ETV6) (Supplementary Tables 8-10). 

A comparison of RNA-seq data with genomes/exomes data revealed 
that only 36% of validated somatic SNVs were observed in the transcrip- 
tome sequence (Supplementary Table 3 and Supplementary Fig. 2b). Ina 
recent lymphoma study, similar proportions were observed (137 of 329 
somatic mutations expressed in RNA-seq)"’. As expected, the propor- 
tion of low-abundance somatic SNVs observed in RNA is reflected in 
the distribution of wild-type, heterozygous and homozygous expressed 
mutations (Supplementary Fig. 2b), consistent with the notion that 


1Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 2B5, Canada. ?Molecular Oncology, British Columbia Cancer Research Centre, 
Vancouver, British Columbia V5Z 1L3, Canada. ?Canada’s Michael Smith Genome Sciences Centre, Vancouver, British Columbia V5Z 1L3, Canada. “Centre for Molecular Medicine and Therapeutics, 950 
West 28th Avenue, Vancouver, British Columbia V5Z 4H4, Canada. °Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada. °British Columbia 
Cancer Agency, 600 West 10th Avenue, Vancouver, British Columbia V5Z 4E6, Canada. ’Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 ORE, UK. 
®Department of Oncology, University of Cambridge, Hills Road, Cambridge CB2 2XZ, UK. Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, 
California 90033, USA. !°Departments of Oncology and Laboratory Medicine and Pathology, University of Alberta, 11560 University Avenue, Cross Cancer Institute, Edmonton, Alberta T6G 1Z2, Canada. 
1 Life Technologies, 101 Lincoln Centre Dr., Foster City, California 94404, USA. !*Department of Pathology and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, 
California 94143, USA. 'SBrain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California 


94143, USA. Department of Computer Science, University of British Columbia, Vancouver, British Columbia V6T 124, Canada. 


15Centre for High-Throughput Biology, University of British Columbia, 


Vancouver, British Columbia V6T 124, Canada. !°Terry Fox Laboratory, BC Cancer Agency, 675 W 10th Avenue, Vancouver, British Columbia V5Z 1L3, Canada. !’Department of Molecular Biology and 


Biochemistry, Simon Fraser University, 8888 University Dr., Burnaby, British Columbia V5A1S6, Canada. !8Centre for Translational and Applied Genomics, BC Cancer Agency, 600 West 10th Ave, 


Vancouver, British Columbia V5Z 4E6, Canada. !°Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada. ?°Cambridge Breast Unit, 
Addenbrookes Hospital, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge CB2 2QQ, UK. *!Cambridge Experimental Cancer Medicine 


Centre (ECMC), Cambridge CB2 ORE, UK. 
*These authors contributed equally to this work. 


00 MONTH 2012 | VOL 000 | NATURE|1 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


205 ¢ Somatic mutation content by case 


1504 * 


Number of somatic mutations 
x 
Lo] 
=! 
| 


Relationship of somatic mutations and CNA by case 


«@ Number of HOMDs and HLAMPs 
o Percentage genome altered 


Number of somatic mutations 
= 
io) 
=] 
| 


low-abundance alleles may represent rarer clones in the primary 
tumour. We found 43 splice junction mutations with evidence for an 
impact on splicing patterns (Supplementary Table 4), encompassing 
several known tumour suppressors (p53, PIK3R1; Supplementary Fig. 
6) as well as many genes not yet implicated in carcinogenesis. Analysis of 
72 somatic mutations in the non-coding space of experimentally deter- 
mined human regulatory regions'* showed (Supplementary Table 11) a 
significant overrepresentation (31.9% versus expected 2.5%, Fisher exact 
test P=2%X10'’) of mutations within retinoblastoma-associated 
protein (RB)-binding sites. Six mutations were predicted to be damaging 
to RB binding (Supplementary Methods and Supplementary Fig. 7). 
This is consistent with observations of frequent functional disruption 
of the RB-regulated cell cycle network’ in TNBC. 

We next searched for mutation enrichment patterns in three ways: 
by single gene mutation frequency over multiple cases; by the mutation 
frequency over multiple members of a gene family; and by correlating 
mutation status with expression networks. First, similar to other 
studies'*'*, p53 is the most frequently mutated gene (Supplementary 
Table 12) with 62% of basal TNBC (determined by gene expression 
classification with PAM50 (ref. 16) analysis on RNA-seq expression 
profiles) and 43% of non-basal TNBC cases harbouring a validated 
somatic mutation. We also observed frequent mutations in PIK3CA at 
10.2% (7/65), USH2A (Usher syndrome gene, implicated in actin 
cytoskeletal functions) at 9.2% (6/65), MYO3A at 9.2%, PTEN and 
RB1 at 7.7% (5/65) and a further eight genes (including ATR, UBR5 
(also known as EDD1), COL6A3) at 6.2% (4/65) of cases in the cohort 
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Figure 1 | Distribution of number of validated 


yas somatic mutations by case over 65 cases. 
53 ; : 
2 prsoa a, Mutation frequency (basal, red; other, grey). 


Patients harbouring known driver gene mutations 
are indicated. b, Case-specific and overall (inset) 
distributions of mutations in CNA classes. AMP, 
amplification; GAIN, single copy gain; HETD, 
hemizygous deletion; HLAMP, high-level 
amplification, HOMD, homozygous deletion; 
NEUT, no copy number change. The number of 
(HOMD, HLAMP) CNAs (black diamonds) and 
percentage genome altered (green circles) are 
indicated. 
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(Fig. 2a). Considering background mutation rates’’, p53, PIK3CA, 
RB1, PTEN, MYO3A and GH1 showed evidence of single gene selec- 
tion (q < 0.1) (Supplementary Table 13). Additional recurrent muta- 
tions of note occurred in the synuclein genes (SYNEI and SYNE2, 9.2% 
6/65, recently implicated in squamous head and neck cancers'*”’), 
BRCA2 (three cases), and several other well known oncogenes 
(BRAF, NRAS, ERBB2 and ERBB3) with mutations in two cases each. 
Approximately 20% of cases contained examples of potentially 
‘clinically actionable’ somatic aberrations, including BRAF V600E, 
high-level EGFR amplifications and ERBB2 and ERBB3 mutations. 
In the second approach we searched for statistically overrepresented 
gene families and protein functions using the Reactome functional 
protein interaction database” (Supplementary Methods). This ana- 
lysis quantifies gene family involvement through sparse mutation 
patterns in functionally connected genes, which would be statistically 
underrepresented by single gene recurrent mutation analysis. The 
overrepresented pathways (false discovery rate (FDR) < 0.001) 
included p53-related pathways along with chromatin remodelling, 
PIK3 signalling, ERBB signalling, integrin signalling and focal adhesion, 
WNT/cadherin signalling, growth hormone and nuclear receptor co- 
activators, and ATM/RB-related pathways (Fig. 3a and Supplementary 
Table 14). We note that the candidate ‘driver’ MYO3<A, a cytoskeleton 
motor protein involved in cell shape and motility, relates to several 
pathways upstream and downstream of integrin signalling. The 
mutated genes include extracellular matrix (ECM) interactions 
(laminins, collagens), ECM receptors (integrins), several proteins 
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cases in green, and ER in blue), 
shown as a percentage of cases (in 
parentheses) with one or more 
mutations. *P < 0.05. 
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regulating actin cytoskeleton dynamics (usherin, palladin, multiple 
myosins) and microtubule motor proteins (kinesins) (Fig. 2a). All of 
these contribute to cellular processes that have been functionally impli- 
cated in cancer progression; however, a signature of somatic mutation 
associated with these proteins has not been previously noted in TNBC. 
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Figure 3 | Network analysis of 254 recurrently mutated genes by somatic 
point mutations and indels. a, Case-specific mutations shaded according to 
clonal frequencies in known driver genes, plus genes from integrin signalling 
and ECM-related proteins (laminins, collagens, integrins, myosins and 
dyneins). b, Significantly overrepresented pathways (FDR < 0.001) from 
recurrently mutated genes (see Supplementary Methods). Node shading 
encodes the adjusted P value (q value) of the comparison of the distribution of 
clonal frequencies of mutations in a given pathway to the overall distribution of 
clonal frequencies. A spectrum of higher (red) and lower (yellow) clonal 
frequencies is evident. Letters in parentheses indicate database sources. 
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To confirm the mutational spectrum in the general breast cancer popu- 
lation we re-sequenced all exons of 29 genes in an additional 159 breast 
cancers (82 oestrogen receptor (ER)* and 77 ER, tumour and 
matched normal) (Fig. 2b), and confirmed that many of the genes 
found in the discovery cohort were recurrently mutated in an addi- 
tional population. Whether this pattern of mutation represents the 
occurrence of disease-modifying mutations, or possibly selection from 
other processes (for example, transcription-related hypermutation) is 
unknown. Interestingly, the enrichment of cytoskeletal functions in the 
somatic aberration landscape is also evident from the copy number and 
alternative splicing landscapes (Supplementary Fig. 8). 

Third, we integrated both the CNA and mutation data with expres- 
sion data to reveal genomic events associated with extreme changes in 
the transcription of interacting genes” (Table 1), using a bipartite 
graph-based method (driverNet; Supplementary Methods). The 
somatic aberrations showing statistically significant association with 
extreme expression in this analysis (P<0.05) (Table 1 and Sup- 
plementary Table 15) implicate well known oncogenes and tumour 
suppressors (TP53, PIK3CA, NRAS, EGFR, RB1, ATM) and suggest 
several new genes of interest, including PRPS2 (a nucleotide bio- 
synthesis enzyme, rank 7), harbouring homozygous deletions in three 
cases, NRC31 (a glucocorticoid receptor, rank 10) with SNVs in three 
cases, four PKC-related genes, PRKCZ, PRKCQ, PRKGI and PRKCE. 
The gene networks show a partial overlap with driverNet applied to the 
TCGA ovarian high-grade serous data*’ (Supplementary Table 16). 

Having identified candidate driver genes and significantly over- 
represented pathways, we asked how these are distributed among 
individual tumours by clustering a pathway-patient-mutation matrix 
(Supplementary Fig. 9). The abundance of implicated pathways can be 


Table 1 | Analysis of the top somatically aberrated genes influencing 
expression 


Rank Gene gband SNVor HLAMP  HOMD Events P value 
indel 

1 TP53 7p13.1 35 0) ) 2242 0 

2 PIK3CA 3q26.32 7 0) ) 441 1 <0" * 

3 NRAS p13.2 2 0) 0 271 4x10+% 

4 EGFR 7p11.2 1 5 ) 220 4x10+% 

5 RB1 3q14.2 5 0) 5 184 5x 1074 

6 PGM2 4p14 1 0) 1 172 5x107+ 

7 PRPS2 23p22.2 0 0) 3 171 5x10+ 

8 PTEN 0q23.31 5 0) 3 150 5x107+% 

9 PRKCE 2p21 0 0) 1 136 7x10°+* 

10 NR3C1 5q31.3 3 0) ¢) 130 7X10 * 

11 CREBBP 6p13.3 1 0) 1 119 8x10+% 

12 cS 2q13.2 1 0) 0 108 0.0011 

13 MAN2A2 5q26.1 2 0) 1 104 0.0012 

14 HMGCS2 p12 1 2 0 100 0.0013 

15 HEXA 5q24.1 2 1 ) 97 0.0013 

16 ADCY9 6p13.3 2 1 ) 91 0.0017 

17 OR4N4 5q11.2 0 0) 5 90 0.0017 

18 LCLAT1 2p23.1 ) 0) 1 85 0.002 

19 DGKkI 7q33 2 0) 0 82 0.0022 

20 CYP2A6 9q13.2 (0) 0 80 0.0024 

21 JAK1 p31.3 0) 0 78 0.0026 

22 POLRIA 2p11.2 2 0) ) 78 0.0026 

23 PLD1 3q26.31 0) ¢) 69 0.0038 

24 IDH3B 20p13 0) 1 68 0.004 

25 PAPSS2 0q23.2 ) 0) 3 67 0.0041 

26 PRKX 23p22.33 0 0 2 65 0.0046 

27 TPH2 2q21.1 0) ) 65 0.0046 

28 UGT2B17 4q13.2 0 0) 1 63 0.0053 

29 RRM2 2p25.1 0 ¢) 57 0.0072 

30 ATM 1q22.3 0) ) 55 0.0084 

31 CLCA1 p22.3 2 0) 0 54 0.009 

32 PRKCZ p36.33 0) ) 53 0.0095 

Rank, derived by the driverNet algorithm (see Supplementary Methods); gene, somatically aberrated 

gene; gband, chromosomal band containing gene; SNV or indel, the number of cases harbouring an 

SNV or indel in the gene; HLAMP, the number of cases harbouring a predicted high-level amplification; 

HOMD, the number of cases harbouring a predicted homozygous deletion; events, number of gene 

expression outliers (see Supplementary Methods) coincident with a genomic aberration and where the 

outlying gene is connected to the aberrated gene; P value, statistical significance based on a randomly 


generated background distribution (Supplementary Methods). 
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seen to be only partially related to the total number of mutations in a 
case, groups 1 and 2 having on average fewer mutations per case. The 
frequent involvement of pathways with p53, PTEN and PIK3CA as 
members, is noted (Supplementary Fig. 9); however, the case group- 
ings also vary by the progressive inclusion of additional pathways (for 
example, WNT signalling, integrin signalling, ERBB signalling, hypoxia 
and PI3K). More than two thirds of cases contained one or more muta- 
tions in the actin/cytoskeletal functions group of genes (Supplementary 
Fig. 9). Some 12% of cases did not contain somatic aberrations in any of 
the frequent drivers or cytoskeletal genes (Supplementary Table 12). 
This suggests that primary TNBCs are mutationally heterogenous from 
the outset, with some patients’ tumours having a small number of 
implicated pathways and few mutations, whereas other patients present 
with tumours containing extensive mutation burdens and multiple 
pathway involvement. 

Motivated by the observation that early primary TNBCs show a 
wide variation of mutation content, we asked whether the clonal 
composition of these primary cancers is similarly varied. We and 
others have shown””’ how deep-frequency measurements of allelic 
abundance can be used to study tumour clonal evolution. Clonal 
mutation frequency, a compound measure of clonal complexity, 
(Fig. 4a) can be estimated from allele abundance, once the influence 
of copy number states, regional loss of heterozygosity (LOH state) and 
tumour cellularity have been considered (although we note that 
approximately 68% of SNVs in this study are in diploid, neutral 
regions). To extend allelic abundance measurements to estimation of 
clonal frequencies, we implemented a Dirichlet process clustering 
model (pyclone; Supplementary Methods and Supplementary Fig. 10) 
that simultaneously estimates the genotype and clonal frequency given 
a list of deeply sequenced mutations and their local copy number and 
heterozygosity contexts. 

Using the set of deeply sequenced (median 20,000), validated 
SNVs, our analysis revealed (Fig. 4b) that groups of mutations within 
individual cases have different clonal frequencies, indicative of distinct 
clonal genotypes. Remarkably, the tumours exhibit a wide spectrum of 
modes over clonal frequencies (Fig. 4b and Supplementary Fig. 11), 
with some cases showing only one or two frequency modes (Fig. 4b), 
indicating a smaller number of clonal genotypes, whereas other 
tumours exhibit multiple clonal frequency modes, indicating more 
extensive clonal evolution. Consistent with early “driver gene’ status, 
mutations in known tumour suppressors such as p53 tend to occur in 
the highest clonal frequency group in most tumours. However, in some 
cases (for example, SA219, SA236; Fig. 4b, Supplementary Fig. 11) p53 
resides in lower-abundance clonal frequency groups (Supplementary 
Fig. 12 and Fig. 3a), suggesting that it was not the founding event. 
Although the number of clonal frequency modes tends to increase with 
the number of mutations, the relationship is not strictly linear (Fig. 4c). 
To determine whether basal and non-basal cancers differ in their 
clonality, we compared the distribution of clonal modes (clusters) by 
case and as an overall distribution, and note that basal TNBCs have 
more clonal frequency modes than non-basal TNBCs (Fig. 4c). Both of 
these distributions emphasize a key observation; namely, that at the 
time of diagnosis TNBCs already display a widely varying clonal evolu- 
tion that mirrors the variation in mutational evolution. 

Finally, we asked where key pathways appear in the distribution of 
clonal frequency groups. We examined the clonal frequency of genes 
in each pathway and ascertained if there was a deviation away from the 
distribution of clonal frequency for all mutations. As expected, 
pathways involving p53 and PIK3CA showed significantly skewed 
distributions (Wilcoxon, q < 0.01; Fig. 3b and Supplementary Fig. 12) 
towards higher clonal frequencies, consistent with their roles in early 
tumorigenesis (Fig. 3a and Supplementary Table 17). Intriguingly, 
pathways with cytoskeletal genes such as myosins, laminins, collagens 
and integrins tend to have lower median clonal frequencies, suggesting 
that somatic mutations in these genes are acquired much later (Fig. 3b). 
Notably, the median clonal frequency for Reactome pathway ‘p53 
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Figure 4 | Clonal evolution in TNBC. a, Schematic representation of 
integration of CNA, LOH, allelic abundance measurements and normal cell 
contamination for clonal frequency estimation using a Dirichlet process (DP) 
model (left). Example of a mixture of three clonal genotypes composed of four 
mutations (A, B, C, D) and their resulting clonal frequencies. b, Estimated 
clonal frequencies for four cases are shown as the distribution of posterior 
probabilities from the pyclone model (Supplementary Methods). Clonal 
frequency distributions are coloured by their frequency group membership. 
c, Left, relationship of mutation abundance (synonymous (Syn) and non- 
synonymous (Non-syn)) and the inferred number of clonal clusters. Middle, 
distribution and kernel density (red line) of the number of inferred clonal 
clusters over 54 TNBCs. Right, kernel density distribution of clonal clusters for 
basal (red) and non-basal (grey) tumours. 


pathway feedback loops’, including 46 mutations in ATM, ATR, 
NRAS, PIK3CA, PTEN, SIAH1 and p53,was 73% (Wilcoxon, 
q = 0.0007), whereas ‘integrin cell surface interactions’, including 23 
mutations in integrin, laminin and collagen genes, had a median clonal 
frequency of 42% (Wilcoxon, q = 0.9569). 
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Primary TNBCs are still treated as if they were a single disease entity, 
yet it is clear they do not behave as a single entity in response to current 
therapies. Here we show for the first time, using next-generation 
sequencing mutational profiling methods, that treatment-naive 
TNBCs display a complete spectrum of mutational and clonal evolu- 
tion, with some patients’ tumours showing only a few somatic coding 
sequence point mutations with a limited number of molecular pathways 
implicated, whereas other patients’ tumours exhibit considerable addi- 
tional mutational involvement. Moreover, the clonal heterogeneity of 
these cancers is also a continuum, with some patients presenting with 
low-clonality cancers and other cases exhibiting more extensive clonal 
evolution at diagnosis. In this respect, the basal expression subtype of 
TNBCs also tends to show higher clonality at diagnosis, although the 
relationship is not exact. 

In clonally evolving tumours, identification of genes by single gene 
mutation frequency measurements will probably favour early driver 
genes, because the subsequent involvement of multiple additional 
pathways during tumour progression is unlikely to be observed as a 
frequent single gene mutation. The clonality analysis emphasizes this 
point: known drivers such as p53, PIK3CA and PTEN have among the 
highest clonal frequencies, whereas mutations in cell shape/motility 
and ECM-signalling genes appear in the lower clonal frequency 
groups, distributed over many genes. Although p53 somatic mutations 
are clearly early events, the clonal frequencies observed in some TNBC 
suggest that they are not always the first event, raising a question about 
what drives early clonal expansion in some of these cancers. Our 
findings suggest that each TNBC at the time of primary diagnosis 
may be ata very different phase of molecular progression, with possible 
implications for approaches to the biology of low clonality versus high 
clonality primary tumours. 


METHODS SUMMARY 


The genomes and transcriptomes of 104 TNBCs were profiled with Affymetrix 
SNP6.0 arrays (all cases), RNA-seq (80 cases; Illumina GAIT), and whole exome/ 
genome sequencing (65 cases; tumour and normal DNA). Exomes were obtained 
using Agilent’s Human All Exon SureSelect Target Enrichment System v.1 fol- 
lowed by Illumina GAII sequencing, and whole genomes were sequenced using 
Life Technologies SOLiD system. Data were analysed using computational 
approaches to detect somatic SNVs*”, indels, copy number alterations, gene 
fusions and gene expression patterns. Predictions were then validated using 
orthogonal experimental assays, including targeted ultra-deep amplicon sequencing 
of SNVs to ~20,000 redundancy. We determined single genes under selection 
using a statistical approach that considers patient-specific background mutation and 
transition/transversion rates. Mutations predicted to alter transcriptional profiles 
were determined using an integrated bipartite graph-based method (driverNet) that 
associates genomic aberrations with outlying expression patterns informed by pre- 
defined pathway gene sets. Disrupted pathways were determined using the Reactome 
FI Cytoscape plugin. Clonal analysis was performed (cases with >10 mutations) 
using a Dirichlet process statistical model that simultaneously estimates clonal fre- 
quencies and mutation genotype given deeply sequenced somatic SNVs and copy 
number estimates. Experimental assays and analytical methodology are detailed in 
the Supplementary Information. 
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An RNA interference screen uncovers anew molecule 
in stem cell self-renewal and long-term regeneration 


Ting Chen!, Evan Heller’, Slobodan Beronja’, Naoki Oshimori!, Nicole Stokes! & Elaine Fuchs! 


Adult stem cells sustain tissue maintenance and regeneration 
throughout the lifetime of an animal’”. These cells often reside 
in specific signalling niches that orchestrate the stem cell’s 
balancing act between quiescence and cell-cycle re-entry based on 
the demand for tissue regeneration” *. How stem cells maintain 
their capacity to replenish themselves after tissue regeneration is 
poorly understood. Here we use RNA-interference-based loss-of- 
function screening as a powerful approach to uncover tran- 
scriptional regulators that govern the self-renewal capacity and 
regenerative potential of stem cells. Hair follicle stem cells provide 
an ideal model. These cells have been purified and characterized 
from their native niche in vivo and, in contrast to their rapidly 
dividing progeny, they can be maintained and passaged long-term 
in vitro’’. Focusing on the nuclear proteins and/or transcrip- 
tion factors that are enriched in stem cells compared with their 
progeny”®, we screened ~2,000 short hairpin RNAs for their effect 
on long-term, but not short-term, stem cell self-renewal in vitro. 
To address the physiological relevance of our findings, we selected 
one candidate that was uncovered in the screen: TBX1. This tran- 
scription factor is expressed in many tissues but has not been 
studied in the context of stem cell biology. By conditionally ablat- 
ing Tbx1 in vivo, we showed that during homeostasis, tissue regen- 
eration occurs normally but is markedly delayed. We then devised 
an in vivo assay for stem cell replenishment and found that when 
challenged with repetitive rounds of regeneration, the Tbx1- 
deficient stem cell niche becomes progressively depleted. Addressing 
the mechanism of TBX1 action, we discovered that TBX1 acts as an 
intrinsic rheostat of BMP signalling: it is a gatekeeper that 
governs the transition between stem cell quiescence and prolifera- 
tion in hair follicles. Our results validate the RNA interference 
screen and underscore its power in unearthing new molecules that 
govern stem cell self-renewal and tissue-regenerative potential. 

Stem cell self-renewal is the process by which stem cells proliferate 
and generate more stem cells. This process requires control of the cell 
cycle and maintenance of the undifferentiated state. Embryonic stem 
cells are refractory to most proliferation checkpoints’, and they typically 
promote self-renewal by suppressing differentiation’. By contrast, the 
few established regulators of self-renewal in adult stem cells function by 
regulating cell-cycle progression”. A prerequisite to unlocking the key 
to regenerative medicine is to dissect the complex mechanisms govern- 
ing stem cell self-renewal in adult tissues. 

With their enormous capacity for tissue regeneration, hair follicles 
offer an ideal system to explore these mechanisms. Hair follicle stem 
cells (HF-SCs) become activated early in each hair growth cycle, whena 
few of these cells exit their niche (called the bulge) to generate a new hair 
follicle. The differentiation of stem cells into lineage-restricted progeny 
probably requires micro-environmental stimuli that are not present in 
the stem cell niche, because this process happens gradually along the 
follicle outer root sheath’*’*. The stepwise process culminates at the 
base of the mature follicle, where committed transit-amplifying 
matrix cells differentiate into the six lineages that are involved in hair 
production. 


Following their activation, stem cells within the bulge and its vicinity 
(the upper outer root sheath, which becomes the bulge in the next 
cycle) briefly self-renew, replenishing the expended stem cells and 
ensuring long-term tissue regeneration’*'*. Niche HF-SCs also prolif- 
erate following injury and repair wounds’*"*. Another feature that 
distinguishes HF-SCs from their committed progeny is their ability 
to be propagated for at least five passages in vitro, reflecting their 
capacity for long-term proliferative potential’. 

In the current study, we surmised that there might be two sources for 
finding intrinsic factors responsible for maintaining ‘stemness’ inside 
and outside the stem cell niche: self-renewal factors that have been 
identified in other stem cell studies; and nuclear proteins that we found 
to be enriched twofold in stem cells relative to their transit-amplifying 
progeny (Supplementary Fig. la, b). Focusing on about 400 such candi- 
dates, we devised an in vitro RNA interference (RNAi) screen for long- 
term versus short-term self-renewal (Fig. 1a). By choosing genes whose 
expression was enriched in stem cells relative to committed proliferative 
progeny, this pool of candidates should not contain housekeeping genes 
and general proliferation-associated genes. However, if short hairpin 
RNAs (shRNAs) target a gene that is essential for long-term but not 
short-term self-renewal, then cells expressing this gene should persist 
during early passages but then decrease in number or disappear with 
sequential passaging. Operating on this premise, we transduced puri- 
fied primary HF-SCs in triplicate with a lentiviral pool encoding con- 
trol (scramble) shRNAs and a pool of 2,035 candidate shRNAs (about 
five per gene) such that, on average, each stem cell expressed a single 
shRNA (Supplementary Fig. 1c). The transduced stem cells were cul- 
tured and, at 24h and following each passage, shRNAs were amplified 
from the surviving cells and subjected to high-throughput sequencing. 

Data are shown for passage 1 (P-1) and P-5 (Fig. 1b, c, Supplemen- 
tary Figs 2 and 3a, and Supplementary Tables 1 and 2). More than 96% 
of the initial shRNAs were detected at 24h after transduction, and 
these shRNAs were used as a reference for changes in shRNA repres- 
entation. Consistent with our strategic exclusion of housekeeping 
genes and general proliferation-associated genes, most cells that 
harboured shRNAs survived the first passage. By contrast, after five 
passages, many shRNAs were depleted or enriched, suggesting that the 
transduced cells had different long-term proliferative potentials. Using 
unsupervised hierarchical clustering, triplicates of individually trans- 
duced and passaged cells behaved strikingly similarly, suggesting that 
these changes reflected bona fide alterations in stem cell character. 

Parallel screens with fibroblasts weeded out shRNAs corresponding 
to cell-survival genes such as Bcl2, which were selected against after five 
passages in both HF-SCs and fibroblasts (Fig. 1c, Supplementary Fig. 3b 
and Supplementary Table 3). Our refined short list of self-renewal 
candidates contained those whose cognates all showed similar trends 
and for which two or more shRNAs per gene displayed specific changes 
in P-5 stem cell cultures but not in P-1 stem cell cultures or in P-5 
fibroblasts (Supplementary Fig. 2 and Supplementary Table 1). 
Category I shRNAs (Fig. 1d) were maintained in P-1 stem cell cultures 
but were underrepresented by more than 90% at P-5, meeting the 
criteria for an shRNA that suppresses a long-term self-renewal gene. 
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Figure 1 | In vitro RNAi screen for genes 
involved in stem cell long-term self-renewal. a, In 
l vitro RNAi screening strategy. b, Unsupervised 
hierarchical clustering of screening replicates. 

c, Scatter plots of normalized reads per shRNA 
between 24h post infection and after one passage 
(P-1) or five passages (P-5) for HF-SCs (left and 
centre) and after five passages for fibroblasts 
(right). The data shown are from one replicate of 
each screening, highlighting the genes whose 
corresponding shRNAs were specifically depleted 
in the long-term passaging of stem cells (red dots) 
and control genes (black dots). The blue line is the 
diagonal line with a ratio of 1.0. The red dashed 
line shows the cut off for 1.7-fold change. 

d, Screening statistics. e, Progressive selection 
against hairpins that target putative long-term self- 
renewal genes. Data are presented as mean + s.d.; 
n= 3.m.o.i., multiplicity of infection; RFP, red 
fluorescent protein; SC, stem cell; TA, transit- 


1 2) 1. 2-3 


Analysis HF-SC P-1 HF-SC P-5 amplifying; vs, versus. 
c HF-SC screen P-5 HF-SC screen P-1 Fibroblast screen P-5 
(long-term renewal) (short-term renewal) (general cell survival) 

15f_ | 154_ | «1ST ; 
a 1.7 fold Scramble. -,/” _ 1.7 fold Scramble , ye |. 1.7 fold Sciam. 
f We = POA o F 
a oe Z ‘ a 
= 10; 4 & 101 2. 40/ : 
n ~ 0 
8 2 2 «- Hmga2 
8 8 @ 
= 54 2 54 @ 54 
> a AN 
(2) DD 
= ae Tbx1 Hmga2  |2 |. gS | 

OV" Bclasse=e-Runx1 OY 0} 

0 5 10 15 0 io) 10 15 0 5 10 15 


log, (reads at 24 h) 


log, (reads at 24 h) 


ia 
o 
d Summary of shRNA hits (16% of total) from the screen e§ f 10 
Known adult SC se 08 
Category of shRNAs Number Screen self-renewal 5 aa 
(%) genes Oc 0.6 
|. Strong depletion oy 2 0.4 
(>90%) 75 3.8 Hmga2, Runx1 TE 02 
£0; 
Il. Medium depletion & ® 0 
(60-90%) 229 11.8 a a ai 
Ill. Enrichment al a 
(>1.7 fold) 7 0.4 @ 
Total (number) 311 16 2 £ 


Representing only 3.8% of the initial pool, category I included shRNAs 
targeting Hmga2, which is required for neural stem cell self-renewal’®, 
and Runx1, which promotes HF-SC proliferation’®. 

Real-time PCR (rtPCR) of transcripts targeted by six of the most 
effective category I shRNAs confirmed that each shRNA blocked the 
expression of its intended target (Supplementary Fig. 4). Moreover, 
stem cells that were individually transduced with Hmga2, Runx1 or 
Tbx1 shRNA were progressively selected against over time (Fig. le). 
The transcription factor TBX1 was particularly intriguing because it 
has been implicated in tissue formation in other organs'”’*. We 
selected it as our model for in vivo testing of the functional relevance 
of our RNAi screen. 

rtPCR and epigenetic chromatin immunoprecipitation followed by 
DNA sequencing (ChIP-seq) analyses’? of purified hair follicle popu- 
lations revealed that Tbx1 was transcribed at higher levels in stem cells 
than in any of their progeny (Fig. 2a, b and Supplementary Fig. 5a). 
In vivo, the developmental expression of TBX1 protein most closely 
resembled that of two essential HF-SC transcription factors, SOX9 and 
LHX2 (Fig. 2c, d). The adult pattern of expression resembled that of 
CD34, which is a cell surface marker of HF-SCs (Fig. 2e, f). Nuclear 
TBX1 was not detected in self-renewing transit-amplifying cells or in 
terminally differentiating cells (Fig. 2g). Notably, in contrast to some 
other HF-SC transcription factors, TBX1 was also maintained in stem 
cells in long-term cultures. 
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To evaluate TBX1 function in vivo, we conditionally targeted Tbx1 
(Tbx1-cKO) in the skin epithelium of embryonic day 15.5 mice*®”’. 
Tbx1-cKO mice were viable, and hair follicle morphogenesis appeared 
to be normal (Supplementary Fig. 5b, c). We tested for possible defects 
in stemness by analysing tissue regeneration during the normal hair 
cycle (Supplementary Fig. 6). For this purpose, same-sex littermates 
were shaved at the normal onset of the first hair cycle (postnatal day 21, 
P21) and the second hair cycle (P60). In both cases, Tbx1-cKO hair 
follicles remained quiescent longer than normal, but they eventually 
cycled. Maturation/differentiation was unaffected, as shown by the 
development of normal hair types and lengths. 

Self-renewal occurs briefly after anagen onset, replenishing the stem 
cells that are used during initiation’*. To challenge stem cells to sustain 
long-term tissue regenerative potential, we repeatedly depilated the 
hair coat, a process that removes the old hair along with tightly adher- 
ing niche signalling cells that maintain stem cell quiescence’*"* 
(Fig. 3a). After depilation, more than 80% of the stem cells remained 
viable in their niche, where they became activated to enter a new hair 
cycle (Supplementary Fig. 7). 

Wild-type (WT) HF-SCs survived each round of depilation- 
induced hair regeneration, indicating the robust ability to sustain 
self-renewal and long-term tissue regeneration. By contrast, after five 
rounds, the Tbx1-cKO stem cell numbers had declined by more than 
70% (P < 0.001) (Fig. 3b, c). Their steady depletion was accompanied 
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Figure 2 | The transcription factor TBX1 is highly enriched in stem cells in 
vivo and in vitro. a, b, Tbx1 expression in HF-SCs, shown compared with 
established HF-SC transcription factor genes, as determined by rtPCR. The 
mRNAs were isolated from fluorescence-activated cell sorting (FACS)-purified 
stem cells or other populations, as indicated. Data are presented as mean + s.d.; 
n= 3. c-g, Nuclear TBX1 in the skin is restricted to developing and postnatal 
HF-SCs. The back skins of mice were processed for immunofluorescence at 
embryonic day 15.5 (E15.5), postnatal day 21 (P21) and P25 (P25). The white 
dashed line indicates the dermal-epithelial boundary, and the asterisk denotes 
hair shaft autofluorescence. Nuclei are shown in blue. Scale bars, 30 tum. b4, By- 
integrin; Bu, bulge stem cells; DP, dermal papilla; Epi, epidermis; HF, hair 
follicle; HG, hair germ (early stem cell progeny); Mx, matrix (committed TA 
progeny); ORS, outer root sheath; Pcad, P-cadherin. 


by a thinning of the hair coat and a reduction in hair follicle density 
(Fig. 3d, e and Supplementary Fig. 8). Histological analysis revealed 
that many Tbx1-cKO hair follicles were dormant and had lost their 
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stem cell niche, retaining only sebaceous glands and dermal papillae. 
However, the few hair follicles that cycled appeared morphologically 
normal, reflecting the presence of active stem cell niches (Fig. 3e). 

Similar results were obtained when hair cycles were monitored 
during natural ageing. Although the intervals between the hair cycles 
were longer than those in the depilation procedure, the Tbx1-cKO 
stem cell niche residents had declined by about 30% after 1 year 
(Fig. 3f). Thus, Tbx1-null stem cells seem to be specifically impaired 
in their long-term ability to replenish their niche during normal and 
depilation-induced tissue regeneration. 

We used 5-bromodeoxyuridine (BrdU) incorporation to define the 
brief window of bulge stem cell proliferation that occurs following 
depilation. WT stem cell proliferation peaked at day 3 after depilation, 
and the cells returned to quiescence by day 7. Tbx1-cKO stem cells also 
proliferated but to a lesser extent during this time period (Fig. 4a). 
Within 2 to 3 days of depilation, only about 25% of Tbx1-null stem 
cells were BrdU-positive, whereas about 70% of WT stem cells were 
BrdU-positive (Fig. 4a). This proliferative decrease was verified by 
DNA-content-based cell-cycle analysis of purified stem cells, which 
showed that the decrease was accompanied by fewer stem cells being 
present in the niche (Fig. 4b). As discussed earlier, most hair follicles 
eventually produced hairs of WT length, reflecting an otherwise 
normal lineage program. 

We observed a similar trend when we monitored the normal hair 
cycle. In this case, fewer stem cells were expended than during depila- 
tion; thus, the demand for self-renewal was lower, as reflected by the 
natural bulge niche having a lower proliferative activity than the 
depilation-induced WT bulge niche. Consistent with a role for TBX1 
in HF-SC self-renewal, the overall proliferative activity within the 
Tbx1-null niche was less than in the WT niche, and that in the natural 
niche was less than in the depilation-induced niche (Supplementary 
Fig. 9). 

To understand how these differences arise, we transcriptionally pro- 
filed messenger RNAs that were isolated from purified HF-SCs 2 days 
after depilation (Fig. 4c, Supplementary Fig. 10 and Supplementary 
Table 4). Bioinformatic analysis using the Database for Annotation, 


Figure 3 | Tbx1-null stem cells fail in an in vivo 
assay for stem cell self-renewal and long-term 
tissue regeneration. a, Schematic of the in vivo 
long-term hair regeneration assay. b, Decline in 
stem cell number with sequential rounds of hair 
regeneration in Tbx1-cKO hair follicles. c, FACS 
quantifications of stem cells after five rounds of 
hair regeneration.; n = 11 for WT, n = 12 for cKO. 
d, Close up of skin surface in the fifth cycle of hair 
regeneration. d, day. e, Representative 
haematoxylin and eosin images and quantifications 
of follicle densities in the fifth cycle of hair 
regeneration; n = 4. f, Decline in stem cell number 
during normal ageing (1 year old) in Tbx1-cKO 
skin; n = 5.c,e, f, Box-and-whisker plots: mid-line, 
median; box, 25th to 75th percentiles; and 
whiskers, minimum and maximum. **, P< 0.001. 
b, d, e, f, The white dashed line indicates the 
dermal-epithelial boundary, and nuclei are shown 
in blue. Scale bars, 500 um (d) and 30 pm (b, e, f). 
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Visualization and Integrated Discovery (DAVID) functional gene 
annotation tool uncovered mostly cell-cycle regulators in the 123 genes 
that were downregulated by 1.8 fold or more in Tbx1-null stem cells 
compared with WT stem cells. The 188 genes that were upregulated by 
1.8 fold or more were enriched for genes implicated in bone morpho- 
genetic protein (BMP) signalling (Fig. 4c, red), including Bmp2, which 
encodes a secreted ligand for BMP receptors, and Id genes, which are 
major targets of the transcriptional effectors of BMPs, namely SMAD4 
in complex with phosphorylated SMAD1 (pSMAD1), pSMAD5 or 
pSMAD8 (denoted pSMAD1/5/8). Using rtPCR, Tbxl-cKO HF-SCs 
upregulated Id1 and Id2 (refs 22, 23) (Fig. 4d). 

In cardiomyocytes, TBX1 seems to suppress BMP signalling by 
competitively interfering with SMAD4 for pSMAD1/5/8 binding™. 
Consistent with this idea, overexpression of a TBX1-green fluorescent 
protein (GFP) fusion protein in WT keratinocytes significantly sup- 
pressed BMP-induced Id1 transcription in vitro (Fig. 4e). Similar 
effects, albeit slower and weaker, were observed for Id2. 

Transgenic overactivation of the BMP circuitry results in hair coat 
thinning with age’’. If TBX1-deficient HF-SCs have a heightened 
sensitivity to BMP signalling, then BMP inhibitors might ameliorate 
the proliferative defect. We tested this hypothesis by plucking hair 
follicles from mice and then injecting them intradermally with beads 
soaked in the BMP antagonist noggin’**’’. Within 3 days, Tbx1-cKO 
HF-SC proliferation had been restored to near WT levels (Fig. 4f, 


P<0.001, and Supplementary Fig. 11). As reflected by the bulge size, 
the expanded HF-SC pool was sustained throughout the hair cycle. 
However, additional treatments with noggin were necessary to main- 
tain the stem cell pool through multiple rounds of depilation-induced 
hair regeneration. Interestingly, the self-renewal of TBX1-deficient 
stem cells was also elevated in vitro when BMP signalling was impaired 
by ablation of the BMP receptor BMPRIA (Fig. 4g). Together, these 
findings are consistent with a BMP-induced proliferative defect in 
Tbx1-null stem cells. 

Given these inverse links between TBX1 and BMP signalling, it 
seemed paradoxical that Smad1 shRNAs surfaced in our self-renewal 
screen (Supplementary Table 1). Further analyses revealed that even 
though these shRNAs depleted Smad1 transcripts and SMAD1 pro- 
tein, the SMAD1/5/8 target genes Id1, Id2 and Id3 were still expressed. 
Moreover, the transduced cells still responded to BMP signalling, as 
judged by reporter assays (Supplementary Fig. 12). 

HF-SCs_ reside in a WNT-restricted, BMP-high micro- 
environment**” in which they must self-renew to replenish the stem 
cell pool. Therefore, HF-SCs must have an intrinsic mechanism to 
lower the BMP signalling threshold, and this mechanism fails to occur 
in the absence of TBX1. Entering the hair cycle also necessitates 
decreased BMP signalling; however, in this case, the proliferation is 
fuelled by early progeny (hair germ) with naturally low TBX1 levels 
that are present at the bulge base*”’. Because, paradoxically, hair cycle 


Figure 4 | TBX1 controls stem cell proliferation 
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initiation was delayed in Tbx1-null mice, we surmise that the hair germ 
may be negatively influenced by BMP2 or other local effectors that are 
secreted by Tbx1-null HF-SCs. 

In summary, we have discovered that TBX1 functions in the replenish- 
ment of HF-SCs during tissue regeneration. Our RNAi screens excluded 
roles for TBX1 in cell cycling, housekeeping and survival. The finding 
that once initiated, Tbx1-null hair cycles have a normal progression also 
ruled out roles for TBX1 in cell-fate determination or lineage progres- 
sion. Instead, the TBX1 defect seems to be rooted in diminished stem 
cell self-renewal, coupled with enhanced sensitization to intrinsic BMP 
signalling. Together, these result in progressive HF-SC depletion and 
thinning of the hair coat. Although the effects of TBX1 are likely to be 
more complex than we have shown, its ability to intrinsically control 
these features poises it in the middle of a balancing act with the micro- 
environment to control stem cell behaviour in tissue homeostasis. 


METHODS SUMMARY 

RNAiscreen. Candidate gene selection was based on previously published micro- 
array analyses”®. HF-SCs isolated by fluorescence-activated cell sorting (FACS) 
were infected with an shRNA library (carried by TRC lentiviruses) targeting a set 
of ~400 candidate genes (2,035 shRNAs, with approximately 5 shRNAs per gene) 
to a final infection rate of approximately 20% (that is, about one virus per five HF- 
SCs). Twenty-four hours after infection, half of the infected HF-SCs were col- 
lected, and the other half were plated onto mitomycin-C-treated fibroblast feeder 
layers. Each week, nearly confluent cultures were trypsinized and replated (one 
passage). At each passage, a fraction of the cells were processed for genomic DNA 
isolation. Primers, including adaptors for Solexa sequencing, were used to amplify 
shRNA-encoding sequences from genomic DNA. Following the PCR amplifica- 
tion of shRNAs, sequencing was performed on an Illumina/Solexa Genome 
Analyzer II according to the manufacturer’s protocols. Analyses and plots of 
DNA sequencing data were performed in the R statistical environment. All 
shRNA identities, as well as primary screening data, are listed in Supplementary 
Tables 1-3. 

Animal studies. Tbx1"" mice were obtained from A. Baldini. To create con- 
ditional knockout mice, we mated hemizygous K14-Cre (CD1) mice with homo- 
zygous Tbx1 "/l (C57BL/6) mice; F, K14-Cre/Tbx1* (CD1/C57BL/6) progeny 
were subsequently bred with homozygous Tbx1 /1 mice to generate K14- 
Cre;Tbx1"!" mice at a 25% Mendelian frequency. Depilation of mid-dorsal hair 
follicles was achieved on anaesthetized mice to provide a proliferative stimulus and 
to synchronize a population of anagen hair follicles. The mid-dorsum was coated 
with molten wax, which was peeled off after hardening. BrdU (Sigma-Aldrich) 
pulse experiments involved intraperitoneal injections (50 ug BrdU per g body 
weight) twice a day. After BrdU pulses for the indicated times, animals were killed. 
The skins were embedded in OCT compound, frozen in dry ice, sectioned using a 
cryostat (Richard-Allan Scientific) and stained for immunofluorescence using 
antibodies specific for BrdU. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


RNAi screen. Candidate gene selection was carried out based on previously 
published microarray analyses**®. Fluorescence-activated cell sorting (FACS)- 
isolated HF-SCs were infected with an shRNA library (carried by TRC lentiviruses) 
targeting a set of ~400 candidate genes (2,035 shRNAs, with approximately 5 
shRNAs per gene for most, if not all, candidates) to a final infection rate of approxi- 
mately 20% (that is, about one virus per five HF-SCs). Twenty-four hours after 
infection, half of the infected HF-SCs were collected, and the other half were plated 
onto mitomycin-C-treated fibroblast feeder layers. Each week, nearly confluent 
cultures were trypsinized and replated (one passage). At each passage, a fraction 
of the cells were processed for genomic DNA isolation. Primers, including adaptors 
for Solexa sequencing, were used to amplify shRNAs from genomic DNA. Following 
the PCR amplification of hairpins, sequencing was performed on an Illumina/Solexa 
Genome Analyzer II according to the manufacturer’s protocols. Analyses and plots 
of DNA sequencing data were performed in the R statistical environment*'. The fold 
change in shRNA representation after sequential passages in vitro was determined 
by comparing the shRNA representation in each sample to that in the control cell 
population collected 24 h after infection. The identities of all ‘hit’ shRNAs, as well as 
primary fold changes, are listed in Supplementary Tables 1-3. 

Fluorescence based competition assay. shRNA-transduced (RFP*) HF-SCs 
(GFP*) were mixed with non-transduced HF-SCs and plated onto feeders 
(GFP). At each passage, the cells were trypsinized and replated, and a fraction 
of the cells were used to measure the proportion of REP* GEP*/RFP GFP” cells 
using flow cytometry on an LSRII FACS Analyzer (BD Biosciences). Analyses were 
carried out for five consecutive passages. 

RNAi construct and sequences. The RNAi lentiviral constructs were from the 
RNAi Consortium (TRC) mouse lentiviral library. Scramble shRNA (Ctrl shRNA) 
that does not target any mouse MRNA was used as a control. Lentiviral vectors 
were packaged, concentrated and used for infection as previously described’. 
Sequences of individual shRNAs used in experiments are listed here, together with 
their symbol in oligoSeq. Tbx1 F2, CCGGGCTACCGGTATGCTTTCCATAC 
TCGAGTATGGAAAGCATACCGGTAGCTTTTTG; Thx1 F3, CCGGCTGACC 
AATAACCTGCTGGATCTCGAGATCCAGCAGGTTATTGGTCAGTTTTTG; 
Hmga2 A2, CCGGGCCACAACAAGTCGTTCAGAACTCGAGTTCTGAACG 
ACTTGTTGTGGCTTTTTG; Hmga2 A4, CCGGAGACCTAGGAAATGGCCA 
CAACTCGAGTTGTGGCCATTTCCTAGGTCTTTTTTG; Runxl A2 CCGGG 
CCCTCCTACCATCTATACTACTCGAGTAGTATAGATGGTAGGAGGGCT 
TTTTG; Runx1 A3, CCGGGTCTTTACAAATCCGCCACAACTCGAGTTGTG 
GCGGATTTGTAAAGACTTTTTG; Smad1 B3, CCGGGCCGAGTAACTGC 
GTCACCATCTCGAGATGGTGACGCAGTTACTCGGCTTITT; Smad1 F1, 
CCGGCCCATTTGGTTCCAAGCAGAACTCGAGTTCTGCTTGGAACCAAA 
TGGGTTTTT; Smad1 F2, CCGGACCGTGTATGAACTCACCAAACTCGAGT 
TIGGTGAGTTCATACACGGTTTTTT; Smad1 F4, CCGGTGGTGCTCTATT 
GTGTACTATCTCGAGATAGTACACAATAGAGCACCATTTTT; and Smad1 
F5, CCGGTCCTATTTCATCCGTGTCTTACTCGAGTAAGACACGGATGAA 
ATAGGATTTTT. 

Mice, depilation and BrdU labelling. Tbx1 "/) mice were obtained from A. 
Baldini. To create conditional knockout mice, we mated hemizygous K14-Cre 
(CD1) mice with homozygous Tbx 1" (C57BL/6) mice; F, K14-Cre/Tbx1 af 
(CD1/C57BL/6) progeny were subsequently bred with homozygous Tbx1"" mice 
to generate K14-Cre;Tbx1"!" mice at a 25% Mendelian frequency. Depilation of 
mid-dorsal hair follicles was achieved on anaesthetized mice to provide a prolif- 
erative stimulus and to synchronize a population of anagen hair follicles. The mid- 
dorsum was coated with molten wax, which was peeled off after hardening. BrdU 
(Sigma-Aldrich) pulse experiments involved intraperitoneal injections 
(50 ug BrdU per g body weight) twice a day. After BrdU pulses for the indicated 
times, animals were killed. The skins were embedded in OCT compound, frozen in 
dry ice, sectioned using a cryostat (Richard-Allan Scientific) and stained for 
immunofluorescence using antibodies specific for BrdU. All animals were main- 
tained in an AAALAC-approved animal facility, and procedures were performed 
with IACUC-approved protocols. 

Semi-quantitative RT-PCR. RNAs were isolated from cells using an RNeasy kit 
(QIAGEN), and DNase treatment was performed to remove genomic DNA. Equal 
RNA amounts were added to reverse-transcriptase reaction mix (Invitrogen) with 
oligo(dT), as a primer. Semi-quantitative PCR was conducted with a LightCycler 
system (Roche Diagnostics). Reactions were performed using the indicated 
primers and template mixed with the LightCycler DNA Master SYBR Green kit 
and were run for 45 cycles. The specificity of the reactions was determined by 
subsequent melting curve analysis. LightCycler analysis software was used to 
remove background fluorescence (noise band). The number of cycles needed to 
reach the crossing point for each sample was used to calculate the amount of each 
product using the 2°~“4“) method. The levels of PCR product were expressed as 
a function of Ppib. The primers were designed to produce a product spanning 


exon-intron boundaries. The sequences of the primers were as follows: Ppib sense, 
GTGAGCGCTTCCCAGATGAGA; Ppib antisense, T@CCGGAGTCGACAAT 
GATG; Sox9 sense, CGGCGGAGGAAGTCGGTGAAGAAC; Sox9 antisense, 
GGTGGGTGCGGTGCTGCTGATG; Tbx1 sense, GCTGTGGGACGAGTTCA 
ATC; Tbx1 antisense ACGTGGGGAACATTCGTCT; Lhx2 sense, CCAGCTTC 
GGACAATGAAGT; Lhx2 antisense, TTTCCTGCCGTAAAAGGTTG; Nfatcl 
sense, AACGCCCTGACCACCGATAGCACT;  Nfatcl antisense, CCCGGCT 
GCCTTCCGTCTCATA; Runx1 sense, CTCCGTGCTACCCACTCACT; Runx1 
antisense, ATGACGGTGACCAGAGTGC; Hmga2 sense, AAGGCAGCAAAA 
ACAAGAGC; Hmga2 antisense, CCGTTTTTCTCCAATGGTCT; Smad1 sense, 
AACACCAGGCGACATATTGG; Smad1 antisense, CACTGAGGCATTCCG 
CATA; Smad5 sense, GCAGTAACATGATTCCTCAGACC; Smad5 antisense, 
GCGACAGGCTGAACATCTCT; Smad8 sense, CGGATGAGCTTTGTGAA 
GG; Smad8 antisense, GGGTGCTCGTGACATCCT; Id1 sense, GAGTCTGAAG 
TCGGGACCAC; Id1 antisense, TTTTCCTCTTGCCTCCTGAA; Id2 sense, 
AATGGCCTTTTTGACACGAG; Id2 antisense. AAAGCAAGCAATCAACA 
TTCAA; Id3 sense, GAGGAGCTTTTGCCACGAC; and Id3 antisense, 
TGAAGAGGGCTGGGTTAAGA. 

Histology and immunofluorescence. Tissues were embedded in OCT com- 
pound, and frozen sections were fixed in 4% paraformaldehyde and subjected to 
immunofluorescence microscopy or haematoxylin and eosin staining as previ- 
ously described'. The antibodies (and their dilutions) used were as follows: 
anti-LHX2 (rabbit, 1:2,500, Fuchs lab), anti-SOX9 (rabbit, 1:1,000, Fuchs lab), 
anti-P-cadherin (goat, 1:100, R&D Systems), anti-o,-integrin (rat, 1:100, 
Pharmingen), anti-K5 (rabbit, 1:500, Fuchs lab), anti-CD34 (rat, 1:100, 
Pharmingen), anti-BrdU (rat, 1:500, Abcam), anti-active-caspase 3 (rabbit, 
1:500, R&D Systems), anti-TBX1 (rabbit, 1:100, Zymed), and FITC-conjugated 
(1:100, Jackson) or Alexa594-conjugated (1:1,000, Molecular Probes) secondary 
antibodies. Nuclei were stained using 4',6-diamidino-2-phenylindole (DAPI). 
Imaging was performed using either a Zeiss Axioskop equipped with Spot RT 
(Diagnostic Instruments) or a Zeiss LSM 510 laser-scanning microscope (Carl 
Zeiss Microlmaging) through a 40 water objective or a 25X objective. RGB 
images were assembled in Adobe Photoshop CS3, and panels were labelled in 
Adobe Illustrator CS3. 

Isolation of HF-SCs and flow cytometry. Subcutaneous fat was removed from 
the skins with a scalpel, and the whole skin was placed dermis down on trypsin 
(Gibco) at 37 °C for 30 min. Single-cell suspensions were obtained by scraping the 
skin gently. The cells were then filtered with strainers (70 mm, followed by 40 mm). 
Cell suspensions were incubated with the appropriate antibodies for 30 min on ice. 
The following antibodies were used: anti-CD34—Alexa647 (1:100, eBioscience) 
and anti-o,-integrin—PE (1:100, BD Biosciences). DAPI was used to exclude dead 
cells. Cell isolations were performed on FACSAria sorters equipped with 
FACSDiva software (BD Biosciences). FACS analyses were performed using 
LSRII FACS Analyzers and then analysed with the FlowJo program. 

RNA isolation and microarray analyses. RNAs from FACS-purified Tbx1-cKO 
mice and WT HF-SCs 2 days post depilation were provided to the Genomics Core 
Facility at Memorial Sloan-Kettering Cancer Center for quality control, quan- 
tification, reverse transcription, labelling, and hybridization to MOE430A 2.0 
microarray chips (Affymetrix). Two entirely independent samples were used for 
data analyses. Arrays were scanned as per the manufacturer’s specifications for the 
Affymetrix MOE430v2 chip. Images were background-subtracted. Probe sets were 
identified as differentially expressed when the average fold change was =1.8 
(P <0.1). Probe sets selected for visualization were log, transformed, analysed 
with hierarchical clustering (Pearson correlation, average linkage) and visualized 
with heat maps to assist in interpretation. 

Noggin intradermal injection. Recombinant mouse noggin (5 ,1gml_', R&D 
Systems) was injected intradermally, together with beads, into the back skin for 
3 days post depilation. The skin was analysed on the third day. BrdU was injected 
twice in the last 24h before harvesting the skin. 

In vitro BMP4 treatment. For BMP4 treatment experiments, mouse keratinocytes 
grown in six-well plates and transfected with Tbx1-GFP (a Tbx1 expression vector 
obtained by cloning a mouse cDNA into the pCMV6-AC-GFP plasmid) were 
treated with 100ngml~' mouse recombinant BMP4 for the indicated period. 
After treatment, cells were trypsinized, and GEP* versus GEP™ cells were isolated 
by FACS and separately processed for total RNA extraction and rtPCR analysis. 
Immunoblotting. Mouse keratinocytes transduced with scramble or Smad1 
shRNAs by lentiviral infection were selected in 1 pg ml~’ puromycin-containing 
media for 4 days. Cells were treated with or without BMP4 (100 ng ml) for 3h 
and lysed directly in SDS sample buffer (50 mM Tris-HCl, pH 6.8, 100 mM DTT, 
2% SDS, 0.1% bromophenol blue and 10% glycerol). Gel electrophoresis was 
performed using 4-12% NuPAGE Bis-Tris gradient gels (Invitrogen), and sepa- 
rated proteins were transferred to PVDF membranes (Millipore). Membranes 
were blocked for 1h in Odyssey blocking buffer (LI-COR Biosciences), then 
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incubated with primary antibodies in blocking buffer overnight at 4 °C. The primary 
antibodies used were as follows: mouse anti-SMAD1 (1:300, Abcam), rabbit anti- 
pSMAD1/5/8 (Ser463/465) (1:2,000, Millipore), mouse anti-SMAD2/3 (1:2,000, BD 
Biosciences), rabbit anti-pSMAD2 (Ser465/467) (1:2,000, Cell Signaling 
Technology), rabbit anti-cSMAD5 (1:500, Abcam) and mouse anti-o-tubulin 
(1:2,000, Sigma). Secondary antibodies were conjugated to IRDye 680 or IRDye 
800 (1:15,000, LI-COR Biosciences). 

BMP-reporter experiments. Construction of a lentiviral BMP reporter (pLKO- 
H2B-CFP-BRE-ZsGreen) is described elsewhere*’. Briefly, H2B-CFP works as a 
transduction marker, which is constitutively transcribed from the PGK promoter. 
Asa reporter, ZsGreen is transcribed from a minimal CMV promoter conjugated 
to the BMP response element (BRE) in the presence of BMP. We transduced the 
BMP reporter into mouse keratinocytes, FACS-sorted the transduced cells based 
on H2B-CFP expression and transduced these cells with viral vectors carrying 
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scramble or Smad1 shRNAs. After puromycin selection, we tested BMP-reporter 
activities in the presence or absence of BMP (100 ng ml — ) for 24h. For visualiza- 
tion and quantification of direct fluorescent signals from the BMP reporter, 
keratinocytes were seeded onto coverslips and fixed with 4% paraformaldehyde 
for 10 min at room temperature. 

Statistics. To determine the significance between two groups, indicated in the 
figures by asterisks, comparisons were made using Student’s t-test, performed by 
Prism5 software or Microsoft Excel. Box-and-whisker plots are used to describe 
the entire population without assumptions about the statistical distribution. 


31. R Foundation for Statistical Computing. The R Project for Statistical Computing 
(http://www.r-project.org) (2011). 

32. Beronja, S., Livshits, G., Williams, S. & Fuchs, E. Rapid functional dissection of 
genetic networks via tissue-specific transduction and RNAi in mouse embryos. 
Nature Med. 16, 821-827 (2010). 
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Multiple studies have confirmed the contribution of rare de novo 
copy number variations to the risk for autism spectrum disorders’*. 
But whereas de novo single nucleotide variants have been identified 
in affected individuals‘, their contribution to risk has yet to be 
clarified. Specifically, the frequency and distribution of these muta- 
tions have not been well characterized in matched unaffected 
controls, and such data are vital to the interpretation of de novo 
coding mutations observed in probands. Here we show, using 
whole-exome sequencing of 928 individuals, including 200 pheno- 
typically discordant sibling pairs, that highly disruptive (nonsense 
and splice-site) de novo mutations in brain-expressed genes are 
associated with autism spectrum disorders and carry large effects. 
On the basis of mutation rates in unaffected individuals, we demon- 
strate that multiple independent de novo single nucleotide variants 
in the same gene among unrelated probands reliably identifies risk 
alleles, providing a clear path forward for gene discovery. Among a 
total of 279 identified de novo coding mutations, there is a single 
instance in probands, and none in siblings, in which two independ- 
ent nonsense variants disrupt the same gene, SCN2A (sodium 
channel, voltage-gated, type II, a subunit), a result that is highly 
unlikely by chance. 

We completed whole-exome sequencing in 238 families from the 
Simons Simplex Collection (SSC), a comprehensively phenotyped 
autism spectrum disorders (ASD) cohort consisting of pedigrees with 
two unaffected parents, an affected proband, and, in 200 families, an 
unaffected sibling’. Exome sequences were captured with NimbleGen 
oligonucleotide libraries, subjected to DNA sequencing on the 
Illumina platform, and genotype calls were made at targeted bases 
(Supplementary Information)*’. On average, 95% of the targeted bases 
in each individual were assessed by =8 independent sequence reads; 
only those bases showing =20 independent reads in all family 
members were considered for de novo mutation detection. This 
allowed for analysis of de novo events in 83% of all targeted bases 
and 73% of all exons and splice sites in the RefSeq hg18 database 
(http://www.ncbi.nlm.nih.gov/RefSeq/; Supplementary Table 1; 
Supplementary Data 1). Given uncertainties regarding the sensitivity 
of detection of insertion-deletions, case-control comparisons reported 
here consider only single base substitutions (Supplementary Informa- 
tion). Validation was attempted for all predicted de novo single 
nucleotide variants (SNVs) via Sanger sequencing of all family 
members, with sequence readers blinded to affected status; 96% were 
successfully validated. We determined there was no evidence of 


systematic bias in variant detection between affected and unaffected 
siblings through comparisons of silent de novo, non-coding de novo, 
and novel transmitted variants (Fig. 1a; Supplementary Figs 1-5; 
Supplementary Information). 

Among 200 quartets (Table 1), 125 non-synonymous de novo SNVs 
were present in probands and 87 in siblings: 15 of these were nonsense 
(10 in probands; 5 in siblings) and 5 altered a canonical splice site (5 in 
probands; 0 in siblings). There were 2 instances in which de novo SNVs 
were present in the same gene in two unrelated probands; one of these 
involved two independent nonsense variants (Table 2). Overall, the 
total number of non-synonymous de novo SNVs was significantly 
greater in probands compared to their unaffected siblings (P = 0.01, 
two-tailed binomial exact test; Fig. 1a; Table 1) as was the odds ratio 
(OR) of non-synonymous to silent mutations in probands versus 
siblings (OR= 1.93; 95% confidence interval (CI), 1.11-3.36; 
P= 0.02, asymptotic test; Table 1). Restricting the analysis to nonsense 
and splice site mutations in brain-expressed genes resulted in substan- 
tially increased estimates of effect size and demonstrated a significant 
difference in cases versus controls based either on an analysis of muta- 
tion burden (N = 13 versus 3; P = 0.02, two-tailed binomial exact test; 
Fig. 1a; Table 1) or an evaluation of the odds ratio of nonsense and 
splice site to silent SNVs (OR = 5.65; 95% CI, 1.44-22.2; P= 0.01, 
asymptotic test; Fig. 1b; Table 1). 

To determine whether factors other than diagnosis of ASD could 
explain our findings, we examined a variety of potential covariates, 
including parental age, IQ and sex. We found that the rate of de novo 
SNVs indeed increases with paternal age (P = 0.008, two-tailed 
Poisson regression) and that paternal and maternal ages are highly 
correlated (P<0.0001, two-tailed linear regression). However, 
although the mean paternal age of probands in our sample was 1.1 
years higher than their unaffected siblings, re-analysis accounting for 
age did not substantively alter any of the significant results reported 
here (Supplementary Information). Similarly, no significant relation- 
ship was observed between the rate of de novo SNVs and proband IQ 
(P= 0.19, two-tailed linear regression, Supplementary Information) 
or proband sex (P = 0.12, two-tailed Poisson regression; Supplemen- 
tary Fig. 6; Supplementary Information). 

Overall, these data demonstrate that non-synonymous de novo 
SNVs, and particularly highly disruptive nonsense and splice-site de 
novo mutations, are associated with ASD. On the basis of the conser- 
vative assumption that de novo single-base coding mutations observed 
in siblings confer no autism liability, we estimate that at least 14% of 
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2Child Study Center, Department of Pediatrics, Yale University School of Medicine, 230 South Frontage Road, New Haven, Connecticut 06520, USA. *Neurogenetics Program, UCLA, 695 Charles E. Young 
Dr. South, Los Angeles, California 90095, USA. “Department of Genetics, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, Connecticut 06510, USA. Department of 

Computer Science, Yale Center for Genome Analysis, Yale University, 51 Prospect Street, New Haven, Connecticut 06511, USA. "Department of Neurobiology, Kavli Institute for Neuroscience, Yale University 
School of Medicine, 333 Cedar Street, New Haven, Connecticut 06520, USA. Department of Neurosurgery, Center for Human Genetics and Genomics, Program on Neurogenetics, Yale University School of 
Medicine, 333 Cedar Street, New Haven, Connecticut 06520, USA. ®Yale Center for Genome Analysis, 300 Heffernan Drive, West Haven, Connecticut 06516, USA. ?Department of Statistics, Carnegie Mellon 
University, 130 DeSoto Street, Pittsburgh, Pennsylvania 15213, USA. !°Department of Psychiatry and Human Genetics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213, USA. 


*These authors contributed equally to this work. 


00 MONTH 2012 | VOL 000 | NATURE]|1 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


lH Proband 


Silent -e Poisson 


B Sibling 


P=0.01 P=0.001 
x x 


distribution 


Missense 


Number of subjects 


Nonsense? 


Rate of de novo SNVs (x 10-8) 


All All Brain, , All Brain 


100; 
120; 


OR 5.65 
P=0.01* 


bd by Ey 
Silent Non-syn. Nonsenset 


Figure 1 | Enrichment of non-synonymous de novo variants in probands 
relative to sibling controls. a, The rate of de novo variants is shown for 200 
probands (red) and matched unaffected siblings (blue). ‘All’ refers to all RefSeq 
genes in hg18, ‘Brain’ refers to the subset of genes that are brain-expressed™ and 
‘Non-syn’ to non-synonymous SNVs (including missense, nonsense and splice 
site SNVs). Error bars represent the 95% confidence intervals and P values are 
calculated with a two-tailed binomial exact test. b, The proportion of 
transmitted variants in brain-expressed genes is equal between 200 probands 
(red) and matched unaffected siblings (blue) for all mutation types and allele 
frequencies, including common (21%), rare (<1%) and novel (single allele in 


affected individuals in the SSC carry de novo SNV risk events 
(Supplementary Information). Moreover, among probands and con- 
sidering brain-expressed genes, an estimated 41% of non-synonymous 
de novo SNVs (95% CI, 21-58%) and 77% of nonsense and splice site 


Table 1 | Distribution of SNVs between probands and siblings 
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1404 
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one of the 400 parents); in contrast, both non-synonymous and nonsense de 
novo variants show significant enrichment in probands compared to unaffected 
siblings (73.7% versus 66.7%, P = 0.01, asymptotic test and 9.5% versus 3.1%, 
P= 0.01 respectively). c, The frequency distribution of brain-expressed non- 
synonymous de novo SNVs is shown per sample for probands (red) and siblings 
(blue). Neither distribution differs from the Poisson distribution (black line), 
suggesting that multiple de novo SNVs within a single individual do not 
confirm ASD risk. Nonsense’ represents the combination of nonsense and 
splice site SNVs. 


de novo SNVs (95% CI, 33-100%) point to bona fide ASD-risk loci 
(Supplementary Information). 

We next set out to evaluate which of the particular de novo SNVs 
identified in our study confer this risk. On the basis of our prior work’, 


Category Total number of SNVs* SNVs per subject Per base SNV rate (x107) Py Odds ratio (95% Cl)t 
Pro Sib Pro Sib Pro Sib 
N=200 N=200 N=200 N=200 N= 200 N= 200 
De novo 
All genes 
A 154 1258 0.77 0.63 1.58 31. 0.09 NA 
Silent 29 39 0.15 0.20 0.29 0.40 0.28 NA 
All non-synonymous 125 87 0.63 0.44 1.29 0.92 0.01 1.93 (1.11-3.36) 
Missense 110 82 0.55 0.41 1.13 0.86 0.05 1.80 (1.03-3.16) 
Nonsense/splice site 15 5 0.08 0.03 0.16 0.05 0.04 4.03 (1.32-12.4) 
Brain-expressed genes 
A 137 96 0.69 0.48 1.41 1.01 0.01 NA 
Silent 23 30 0.12 0.15 0.24 0.31 0.41 NA 
All non-synonymous 114 67 0.57 0.34 1.18 0.71 0.001 2.22 (1.19-4.13) 
Missense 101 64 0.51 0.32 1.04 0.68 0.005 2.06 (1.10-3.85) 
Nonsense/splice site 13 3 0.07 0.02 0.14 0.03 0.02 5.65 (1.44-22.2) 
Novel transmitted 
All genes 
Al 26,565 26,542 133 133 277 277 0.92 NA 
Silent 8,567 8,642 43 43 90 91 0.57 NA 
All non-synonymous 17,998 17,900 90 90 188 187 0.61 1.01 (0.98-1.05) 
Missense 17,348 17,250 87 86 181 180 0.60 1.01 (0.98-1.05) 
Nonsense/splice site 650 650 3.3 3.3 ri ri 1.00 1.01 (0.90-1.13) 
Brain-expressed genes 
Al 20,942 20,982 105 105 219 220 0.85 NA 
Silent 6,884 6,981 34 35 72 74 0.42 NA 
All non-synonymous 14,058 14,001 70 70 147 146 0.74 1.02 (0.98-1.06) 
Missense 13,588 13,525 68 68 142 141 0.71 1.02 (0.98-1.06) 
Nonsense/splice site 470 476 23 24 5 5 0.87 1.00 (0.88-1.14) 
* An additional 15 de novo variants were seen in the probands of 25 trio families; all were missense and 14 were brain-expressed. 
+The P values compare the number of variants between probands and siblings using a two-tailed binomial exact test (Supplementary Information); P values below 0.05 are highlighted in bold. 
{The odds ratio calculates the proportion of variants in a specific category to silent variants and then compares these ratios in probands versus siblings. NA, not applicable. 
§ The sum of silent and non-synonymous variants is 126, however one nonsense and two silent de novo variants were indentified in KANK1 ina single sibling, suggesting a single gene conversion event. This event 


contributed a maximum count of one to any analysis. 
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Table 2 | Loss of function mutations in probands 


Gene symbol Gene name Mutation type 
ADAM33 ADAM metallopeptidase domain 33 onsense 
CSDE1 cold shock domain containing E1, RNA-binding onsense 
EPHB2 EPH receptor B2 onsense 
FAM8A1 family with sequence similarity 8, member Al onsense 
FREM3 FRAS1 related extracellular matrix 3 Nonsense 
MPHOSPH8 _ M-phase phosphoprotein 8 onsense 
PPM1D protein phosphatase, Mg**/Mn?* dependent 1D Nonsense 
RAB2A RAB2A, member RAS oncogene family onsense 
SCN2A sodium channel, voltage-gated, type II, « subunit Nonsense 
SCN2A sodium channel, voltage-gated, type II, « subunit Nonsense 
BIN1A1 butyrophilin, subfamily 1, member A1 Splice site 
FCRL6 Fc receptor-like 6 Splice site 
KATNAL2 katanin p60 subunit A-like 2 Splice site 
NAPRT1 nicotinate phosphoribosyltransferase domain Splice site 
containing 1 
RNF38 ring finger protein 38 Splice site 
SCP2 sterol carrier protein 2 Frameshift* 
SHANK2 SH3 and multiple ankyrin repeat domains 2 Frameshift* 


* Frameshift de novo variants are not included in any of the reported case-control comparisons 
(Supplementary Information). 


we hypothesized that estimating the probability of observing multiple 
independent de novo SNVs in the same gene in unrelated individuals 
would provide a more powerful statistical approach to identifying 
ASD-risk genes than the alternative of comparing mutation counts 
in affected versus unaffected individuals. Consequently, we conducted 
simulation experiments focusing on de novo SNVs in brain-expressed 
genes, using the empirical data for per-base mutation rates and taking 
into account the actual distribution of gene sizes and GC content across 
the genome (Supplementary Information). We calculated probabilities 
(P) and the false discovery rate (Q) based on a wide range of assump- 
tions regarding the number of genes conferring ASD risk (Supplemen- 
tary Fig. 7; Fig. 2). On the basis of 150,000 iterations, we determined 
that under all models, two or more nonsense and/or splice site de novo 
mutations were highly unlikely to occur by chance (P= 0.008; 
Q = 0.005; Supplementary Information; Fig. 2a). Importantly, these 
thresholds were robust both to sample size, and to variation in our 
estimates of locus heterogeneity. Similarly, in our sample, two or more 
nonsense or splice site de novo mutations remained statistically sig- 
nificant when the simulation was performed using the lower bound of 
the 95% confidence interval for the estimate of de novo mutation rates 
in probands (Supplementary Fig. 7). 

Only a single gene in our cohort, SCN2A, met these thresholds 
(P = 0.008; Fig. 2a), with two probands each carrying a nonsense de 
novo SNV (Table 2). This finding is consistent with a wealth of data 
showing overlap of genetic risks for ASD and seizure’. Gain of function 
mutations in SCN2A are associated with a range of epilepsy pheno- 
types; a nonsense de novo mutation has been described in a patient 
with infantile epileptic encephalopathy and intellectual decline’, de 
novo missense mutations with variable electrophysiological effects 
have been found in cases of intractable epilepsy’®, and transmitted rare 
missense mutations have been described in families with idiopathic 
ASD". Of note, the individuals in the SSC carrying the nonsense de 
novo SNVs have no history of seizure. 

We then considered whether alternative approaches described in 
the recent literature*”’, including identifying multiple de novo events 
in a single individual or predicting the functional consequences of 
missense mutations, might help identify additional ASD-risk genes. 
However, we found no differences in the distribution or frequency of 
multiple de novo events within individuals in the case versus the 
control groups (Fig. 1c). In addition, when we examined patients 
carrying large de novo ASD-risk CNVs, we found a trend towards 
fewer non-synonymous de novo SNVs (Supplementary Fig. 11; Sup- 
plementary Information). Consequently, neither finding supported a 
‘two de novo hit’ hypothesis. Similarly, we found no evidence that widely 
used measures of conservation or predictors of protein disruption, 
such as PolyPhen2"’, SIFT’, GERP'*, PhyloP’® or Grantham Score’’, 
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Figure 2 | Identification of multiple de novo mutations in the same gene 
reliably distinguishes risk-associated mutations. a, Results of a simulation 
experiment modelling the likelihood of observing two independent nonsense/ 
splice site de novo mutations in the same brain-expressed gene among 
unrelated probands. We modelled the observed rate of de novo brain-expressed 
mutations in probands and siblings, gene size, GC content and varying degrees 
of locus heterogeneity, including 100, 333, 667 or 1,000 ASD-contributing 
genes, as well as using the top 1% of genes derived from a model of exponential 
distribution of risk (indicated by colour). A total of 150,000 iterations were run. 
The rate of occurrences of two or more de novo variants in non-ASD genes was 
used to estimate the P-value (Supplementary Fig. 7) while the ratio 

of occurrences of two or more de novo variants in non-ASD genes to similar 
occurrences in ASD genes was used to estimate the false discovery rate (Q). The 
identification of two independent nonsense/splice site de novo variants in a 
brain-expressed gene in this sample provides significant evidence for ASD 
association (P = 0.008; Q = 0.005) for all models. This observation remained 
statistically significant when the simulation was repeated using the lower bound 
of the 95% confidence interval for the estimate of the de novo mutation rate in 
probands (Supplementary Fig. 7). b, The simulation described in a was used to 
predict the number of genes that will be found to carry two or more nonsense/ 
splice site de novo mutations for a sample ofa given size (specified on the x axis). 
c, The simulation was repeated for non-synonymous de novo mutations. The 
identification of three or more independent non-synonymous de novo 
mutations in a brain-expressed gene provides significant evidence for ASD 
association (P< 0.05; Q< 0.05) in the sample reported here, however these 
thresholds are sensitive both to sample size and heterogeneity models. 


either alone or in combination differentiated de novo non-synonymous 
SNVs in probands compared to siblings (Supplementary Fig. 9; 
Supplementary Information). Additionally, among probands, the de 
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novo SNVs in our study were not significantly over-represented in 
previously established lists of synaptic genes'*”°, genes on chromosome 
X, autism-implicated genes’, intellectual disability genes’, genes within 
ASD-risk associated CNVs* or de novo non-synonymous SNVs iden- 
tified in schizophrenia probands’”". Finally we conducted pathway and 
protein-protein interaction analyses” for all non-synonymous de novo 
SNVs, all brain-expressed non-synonymous de novo SNVs and all 
nonsense and splice site de novo SNVs (Supplementary Fig. 9, 10; Sup- 
plementary Information) and did not find a significant enrichment 
among cases versus controls that survived correction for multiple com- 
parisons, though these studies were of limited power. 

These analyses demonstrate that neither the type nor the number of 
de novo mutations observed solely in a single individual provides 
significant evidence for association with ASD. Moreover, we deter- 
mined that in the SSC cohort at least three, and most often four or 
more, brain-expressed non-synonymous de novo SNVs in the same 
gene would be necessary to show a significant association (Fig. 2c; 
Supplementary Figs 7, 8). Unlike the case of disruptive nonsense 
and splice site mutations, these simulations were highly sensitive to 
both sample size and heterogeneity models (Fig. 2c; Supplementary 
Figs 7, 8; Supplementary Information). 

Finally, at the completion of our study, we had the opportunity to 
combine all de novo events in our sample with those identified in an 
independent whole-exome analysis of non-overlapping Simons 
Simplex families that focused predominantly on trios’. From a total 
of 414 probands, two additional genes were found to carry two highly 
disruptive mutations each, KATNAL2 (katanin p60 subunit A-like 2) 
(our results and ref. 23) and CHD8 (chromodomain helicase DNA 
binding protein 8) (ref. 23), thereby showing association with the 
ASD phenotype. 

Overall, our results substantially clarify the genomic architecture of 
ASD, demonstrate significant association of three genes—SCN2A, 
KATNAL2 and CHD8—and predict that approximately 25-50 addi- 
tional ASD-risk genes will be identified as sequencing of the 2,648 SSC 
families is completed (Fig. 2b). Rare non-synonymous de novo SNVs 
are associated with risk, with odds ratios for nonsense and splice-site 
mutations in the range previously described for large multigenic de 
novo CNVs’. It is important to note that these estimates reflect a mix of 
risk and neutral mutations in probands. We anticipate that the true 
effect size for specific SNVs and mutation classes will be further 
clarified as more data accumulate. From the distribution of large 
multi-genic de novo CNVs in probands versus siblings, we previously 
estimated the number of ASD-risk loci at 234 (ref. 3). Using the same 
approach, the current data result in a point estimate of 1,034 genes, 
however the confidence intervals are large and the distribution of this 
risk among these loci is unknown (Supplementary Information). What 
is clear is that our results strongly support a high degree of locus 
heterogeneity in the SSC cohort, involving hundreds of genes or more. 
Finally, via examination of mutation rates in well-matched controls, 
we have determined that the observation of highly disruptive de novo 
SNVs clustering within genes can robustly identify risk-conferring alleles. 

The focus on recurrent rare de novo mutation described here pro- 
vided sufficient statistical power to identify associated genes in a rela- 
tively small cohort—despite both a high degree of locus heterogeneity 
and the contribution of intermediate genetic risks. This approach 
promises to be valuable for future high-throughput sequencing efforts 
in ASD and other common neuropsychiatric disorders. 


METHODS SUMMARY 

Sample selection. In total 238 families (928 individuals) were selected from the 
SSC°. Thirteen families (6%) did not pass quality control, leaving 225 families (200 
quartets, 25 trios) for analysis (Supplementary Data 1). Of the 200 quartets, 194 
(97%) probands had a diagnosis of autism and 6 (3%) were diagnosed with ASD; 
the median non-verbal IQ was 84. 

Exome capture, sequencing and variant prediction. Whole-blood DNA was 
enriched for exonic sequences through hybridization with a NimbleGen custom 
array (N = 210) or EZExomeV2.0 (N = 718). Captured DNA was sequenced using 
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an Illumina GAIIx (N = 592) or HiSeq 2000 (N = 336). Short read sequences were 
aligned to hg18 with BWA®, duplicate reads were removed and variants were 
predicted using SAMtools’. Data were normalized within families by only analys- 
ing bases with at least 20 unique reads in all family members. De novo predictions 
were made blinded to affected status using experimentally verified thresholds 
(Supplementary Information). All de novo variants were confirmed using 
Sanger sequencing blinded to affected status. 

Gene annotation. Variants were analysed against RefSeq hg18 gene definitions; 
in genes with multiple isoforms the most severe outcome was chosen. All 
nonsense and canonical splice site variants were present in all RefSeq isoforms. 
A variant was listed as altering the splice site only if it disrupted canonical 
2-base-pair acceptor (AG) or donor (GT) sites. Brain-expressed genes were iden- 
tified from expression array analysis across 57 post-mortem brains (age 6 weeks 
post conception to 82 years) and multiple brain regions; 80% of RefSeq genes were 
included in this subset”*. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Sample selection. In total 238 families (928 individuals) were selected from the 
SSC on the basis of: male probands with autism, low non-verbal IQ (NVIQ), and 
discordant Social Responsiveness Scale (SRS) with sibling and parents (N = 40); 
female probands (N = 46); multiple unaffected siblings (N = 28); probands with 
known multigenic CNVs (N = 15); and random selection (N = 109). Thirteen 
families (6%) did not pass quality control (Supplementary Information) leaving 
225 families (200 quartets, 25 trios) for analysis (Supplementary Data 1). Of the 200 
quartets, 194 (97%) probands had a diagnosis of autism and 6 (3%) were diagnosed 
with ASD; the median NVIQ was 84. Three of these quartets have previously been 
reported as trios*; there is no overlap between the current sample and those pre- 
sented in the companion article”. 

Exome capture, sequencing and variant prediction. Whole-blood DNA was 
enriched for exonic sequences (exome capture) through hybridization with a 
NimbleGen custom array (N= 210) or EZExomeV2.0 (N= 718). The captured 
DNA was sequenced using an Illumina GAIIx (N = 592) or HiSeq 2000 (N = 336). 
Short read sequences were aligned to hg18 with BWA‘, duplicate reads were removed 
and variants were predicted using SAMtools’. The data were normalized across each 
family by only analysing bases with at least 20 unique reads in all family members 
(Supplementary Information). De novo predictions were made blinded to affected 
status using experimentally verified thresholds (Supplementary Information). All de 
novo variants were confirmed using Sanger sequencing blinded to affected status. 
Variant frequency. The allele frequency of a given variant in the offspring was 
determined by comparison with dbSNPv132 and 1,637 whole-exome controls 
including 400 parents. Variants were classified as: ‘novel’, if only a single allele 
was present in a parent and none were seen in dbSNP or the other control exomes; 
‘rare’, if they did not meet the criteria for novel and were present in <1% of 
controls; and ‘common’, if they were present in =1% of controls. 

Gene annotation. Variants were analysed against the RefSeq hg18 gene defini- 
tions, a list that includes 18,933 genes. Where multiple isoforms gave varying 
results the most severe outcome was chosen. All nonsense and canonical splice 
site variants were checked manually and were present in all RefSeq isoforms. A 
variant was listed as altering the splice site only if it disrupted canonical 2-base-pair 
acceptor (AG) or donor (GT) sites. 

Brain-expressed genes. A list of brain-expressed genes was obtained from 
expression array analysis across 57 post-mortem brains (age 6 weeks post concep- 
tion to 82 years) and multiple brain regions”. Using these data, 14,363 (80%) of 
genes were classified as brain-expressed (Supplementary Information). 


Rate of de novo SNVs. To allow an accurate comparison between the de novo 
burden in probands and siblings, the number of de novo SNVs found in each 
sample was divided by the number of bases analysed (that is, bases with =20 
unique reads in all family members) to calculate a per-base rate of de novo 
SNVs. Rates are given in Table 1. 

Simulation model. The likelihood of observing multiple independent de novo 
events of a given type for a given sample size in an ASD risk-conferring gene was 
modelled using gene size and GC content (derived from the full set of brain- 
expressed RefSeq genes) and the observed rate of brain-expressed de novo variants 
in probands and siblings. These values were then used to evaluate the number of 
genes contributing to ASD showing two or more variants of the specified type 
(Fig. 2); comparing this to the number of genes with similar events not carrying 
ASD risk gave the likelihood of the specified pattern demonstrating association 
with ASD. The simulation was run through 150,000 iterations across a range of 
samples sizes and multiple models of locus heterogeneity (Supplementary 
Information). 

Severity scores. Severity scores were calculated for missense variants using web- 
based interfaces for PolyPhen2”’, SIFT'* and GERP”, using the default settings 
(Supplementary Information). PhyloP'® and Grantham Score’’ were determined 
using an in-house annotated script. For nonsense/splice site variants the 
maximum score was assigned for Grantham, SIFT and PolyPhen2; for GERP 
and PhyloP, every possible coding base for the specific protein was scored and 
the highest value selected. 

Pathway analysis. The list of brain-expressed genes with non-synonymous de 
novo SNVs was submitted to KEGG using the complete set of 14,363 brain- 
expressed genes as the background to prevent bias. For IPA the analysis was based 
on human nervous system pathways only, again to prevent bias. Otherwise default 
settings were used for both tools. 

Protein-protein interactions. Genes with brain-expressed non-synonymous de 
novo variants in probands were submitted to the Disease Association Protein— 
protein Link Evaluator (DAPPLE)” using the default settings. 

Comparing de novo SNV counts to gene lists. To assess whether non- 
synonymous de novo SNVs were enriched in particular gene sets, the chance of 
seeing a de novo variant in each gene on a given list was estimated based on the size 
and GC content of the gene. The observed number of de novo events was then 
assessed using the binomial distribution probability based on the total number of 
non-synonymous de novo variants in probands and the sum of probabilities for 
de novo events within these genes. 
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An inverse relationship to germline transcription 
defines centromeric chromatin in C. elegans 
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Centromeres are chromosomal loci that direct segregation of the 
genome during cell division. The histone H3 variant CENP-A (also 
known as CenH3) defines centromeres in monocentric organisms, 
which confine centromere activity to a discrete chromosomal 
region, and holocentric organisms, which distribute centromere 
activity along the chromosome length’*. Because the highly 
repetitive DNA found at most centromeres is neither necessary 
nor sufficient for centromere function, stable inheritance of 
CENP-A nucleosomal chromatin is postulated to propagate 
centromere identity epigenetically*. Here, we show that in the 
holocentric nematode Caenorhabditis elegans pre-existing 
CENP-A nucleosomes are not necessary to guide recruitment of 
new CENP-A nucleosomes. This is indicated by lack of CENP-A 
transmission by sperm during fertilization and by removal and 
subsequent reloading of CENP-A during oogenic meiotic pro- 
phase. Genome-wide mapping of CENP-A location in embryos 
and quantification of CENP-A molecules in nuclei revealed that 
CENP-A is incorporated at low density in domains that cumula- 
tively encompass half the genome. Embryonic CENP-A domains 
are established in a pattern inverse to regions that are transcribed 
in the germline and early embryo, and ectopic transcription of 
genes in a mutant germline altered the pattern of CENP-A incorp- 
oration in embryos. Furthermore, regions transcribed in the 
germline but not embryos fail to incorporate CENP-A throughout 
embryogenesis. We propose that germline transcription defines 
genomic regions that exclude CENP-A incorporation in progeny, 
and that zygotic transcription during early embryogenesis 
remodels and reinforces this basal pattern. These findings link 
centromere identity to transcription and shed light on the evolu- 
tionary plasticity of centromeres. 

To characterize CENP-A localization dynamics in C. elegans 
(CeCENP-A; Supplementary Figs 1 and 2), we generated a strain in 
which the only source of CENP-A is a single copy green fluorescent 
protein (GFP)-conjugated transgene encoding GFP-CeCENP-A (Sup- 
plementary Fig. 1b). Imaging in adult hermaphrodites revealed that, in 
the maternal germline, CeCENP-A is removed from chromosomes as 
they enter the pachytene stage of meiotic prophase, and is reloaded when 
nuclei progress into diplotene (Fig. 1a). CeCENP-A was not detected in 
the nuclei of mature sperm (Supplementary Fig. 2a, b), and quantitative 
immunoblotting indicated that sperm have fewer than the detection 
limit of 300 CeCENP-A molecules (Fig. 1b and Supplementary 
Fig. 2c, d). To test for sperm-derived CeCENP-A in embryos, we ferti- 
lized CeCENP-A-depleted oocytes with wild-type sperm. In control 
embryos, CeCENP-A localized to both sperm and oocyte chromatin 
during chromosome condensation, pronuclear migration and mitosis 


(Fig. 1c). As reported in other systems’, this recruitment was independ- 
ent of DNA replication (Supplementary Fig. 3). After fertilization of 
CeCENP-A-depleted oocytes with wild-type sperm, no CeCENP-A 
signal was detected on sperm or oocyte chromatin throughout the cell 
cycle (Fig. 1c). Thus, sperm chromatin does not retain CeCENP-A to 
propagate centromere identity through fertilization in C. elegans. 
Pulse-chase experiments in human cells have suggested that stable 
inheritance of CENP-A on chromatin propagates centromere identity 
through cell division®’. To test whether CeCENP-A is stably inherited 
on chromatin in embryos, we photobleached one set of GFP- 
CeCENP-A-labelled chromatids after separation from their sisters at 
anaphase onset in the one-cell embryo. GFP-CeCENP-A signals were 
then compared in the next round of division between cells inheriting 
bleached or unbleached chromatid sets. Stable inheritance of 
CeCENP-A on chromatin predicts that 50% of CeCENP-A is old 
and the other 50% is new, resulting in a bleached/unbleached ratio 
of 0.5. In contrast, if CeCENP-A is not stably inherited on chromatin 
between the two rounds of division, the bleached/unbleached ratio 
should be 1.0, which is close to the observed value (Fig. 1d-f and 
Supplementary Fig. 4). Thus, despite the short division time (only 
~15min between consecutive metaphases), CeCENP-A is nearly 
completely turned over on chromatin during embryonic cell divisions. 
The above results indicate that pre-existing CeCENP-A nucleo- 
somes may not be the cue that targets new CeCENP-A nucleosomes. 
To define the unknown guiding cue(s), we analysed the genome-wide 
distribution of CeCENP-A in embryos using chromatin immunopre- 
cipitation with a CeCENP-A-specific antibody* followed by hybridiza- 
tion to a tiling microarray (ChIP-chip). As CENP-A chromatin is 
characterized by highly repetitive DNA in most higher eukaryotes, 
this offered the opportunity to define the distribution of CENP-A in 
an organism naturally lacking large stretches of repeats. Our ChIP- 
chip analysis revealed regions of CeCENP-A enrichment along the 
entire length of chromosomes, as predicted for holocentric chro- 
mosome architecture (Fig. 2a and Supplementary Fig. 5a—c). The 
genome-wide distribution of the conserved CENP-A-specific loading 
factor KNL-2 was indistinguishable from that of CeCENP-A, indi- 
cating that the CeCENP-A distribution reflects specific incorporation 
(Fig. 2a, b). A sliding-window-based domain definition algorithm 
revealed that CeCENP-A domains vary considerably in size (median 
10-12 kilobase), cover both genic and intergenic regions, are distributed 
evenly throughout the genome, and do not correlate with repeat density 
(Fig. 2c-e and Supplementary Fig. 5d-f). Although nearly half the 
genome is occupied by CeCENP-A domains (Fig. 2d), quantification 
of CeCENP-A molecules in purified embryonic nuclei showed that 
there is only enough CeCENP-A to occupy at most 4% of the genome 
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Figure 1 | CeCENP-A dynamics in meiotic 
prophase, at fertilization and across embryonic 
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(Fig. 2e and Supplementary Fig. 6a—c). Therefore, the domains enriched 
for CeCENP-A identified by ChIP-chip must be comprised primarily of 
H3 nucleosomes (Fig. 2f). Consistent with this, histone H3 ChIP-chip 
analysis does not show depletion in regions enriched for CeCENP-A’ 
(Supplementary Fig. 6d, e). Thus, the holocentric architecture of 
C. elegans chromosomes arises from reproducible definition of domains 
that are permissive for low-density CeCENP-A incorporation. 

The abundance of genomic regions permissive for CeCENP-A 
incorporation makes it unlikely that they are defined by a specific 
DNA sequence. Instead, a correlation emerged with transcriptional 
status: genes transcribed in embryos were refractory to CeCENP-A 
incorporation, whereas genes that are silent in embryos (but tran- 
scribed in post-embryonic tissues) were permissive for CeCENP-A 
incorporation (Fig. 3a). ChIP-chip analysis of RNA polymerase II 
(Pol II) revealed an inverse correlation with CeCENP-A that extended 
genome-wide (Fig. 3b, c and Supplementary Fig. 7). 

The inverse correlation between transcription and CeCENP-A 
incorporation was puzzling, given that there is no significant RNA 
Pol II-dependent transcription during the first two rounds of embry- 
onic division, and transcriptional activity remains low until the 30-cell 
stage’”"*. In addition, inhibition of transcription using «-amanitin did 
not cause defects in chromosome segregation in early embryos 
(Supplementary Fig. 8). We analysed the CeCENP-A and RNA 
Pol II distribution in four populations of embryos that formed a 
developmentally timed series from very early (73% of embryos with 
= 8 nuclei) to old (67% of embryos with > 200 nuclei) (Supplementary 
Fig. 9a, b). The CeCENP-A distribution remained constant across this 
series, despite the activation or repression of genes (Supplementary 
Figs 9b-d and 10a, b). Thus, CeCENP-A incorporation in embryos 
may not be dictated simply by active transcription. 

To better assess the relationship between transcription and 
CeCENP-A incorporation, we analysed CeCENP-A and RNA Pol II 
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DNA/MT_CeCcENP-A depleted embryos at different stages of the first 
mitotic division were immunostained for 
CeCENP-A and «-tubulin (MT). Wild-type (N2) 
males were mated to fem-1 mutant worms to 
ensure all embryos were cross-progeny. Scale bar, 
5 um. d, Schematic of photobleaching experiment 
to assay CeCENP-A inheritance across early 
embryonic divisions. par-6 RNA interference 
(RNAi) abolishes developmental asynchrony in the 
two-cell embryo. Unbleached (U) and bleached (B) 
chromatid sets are indicated. Scale bar, 2 um. 

e, Representative images and quantification of the 
photobleaching experiment. Higher magnification 
views highlight bleached and unbleached 
chromatid sets. Error bars are 95% confidence 
intervals for the means. Scale bars, 5 um. 


enrichment in different gene classes defined in previous work by their 
expression profiles’’. The inverse correlation between CeCENP-A and 
RNA Pol II held true for four of five gene classes (Fig. 3d). However, the 
“germline-only class, which is comprised of genes transcribed in the 
maternal germline but not in embryos, did not show an inverse correla- 
tion. Instead, both CeCENP-A and embryonic RNA Pol II levels were 
low (Fig. 3e), which was confirmed by individual analysis of well- 
characterized genes transcribed exclusively in the germline (Fig. 3f). 
In addition, “germline-only’ genes failed to incorporate CeCENP-A 
throughout embryogenesis, despite the persistent absence of RNA Pol 
II (Supplementary Fig. 10a, b). Germline-only genes are enriched for 
histone H3 lysine 36 methylation (H3K36me), indicating that the 
absence of CeCENP-A and RNA Pol II signals is not a false negative’’ 
(Fig. 3d-f). Thus, genes transcribed in the maternal germline are 
refractory to CeCENP-A incorporation, even if they are transcriptionally 
silent in embryos. This result, together with the paucity of transcription 
in early embryos and the consistent CeCENP-A distribution throughout 
embryogenesis, indicates that transcriptional activity in the maternal 
germline may render genomic regions refractory to CeCENP-A 
incorporation. Activation of transcription during early embryogenesis 
probably remodels and reinforces this basal CeCENP-A pattern. In 
support of this, genes transcribed in early embryos, but lacking signa- 
tures of germline transcription, also show low CeCENP-A occupancy 
(Supplementary Fig. 10c). 

If germline transcriptional activity influences CeCENP-A incorp- 
oration in the progeny, changes in germline transcription should alter 
the CeCENP-A distribution in embryos. An unexpected observation 
in embryos derived from a null mutant of met-1, which encodes one of 
two C. elegans H3K36 methyltransferases’*, enabled us to test this 
prediction. The met-1 mutant is viable and fertile, indicating that 
transcription is not globally misregulated in this mutant. Consistent 
with this, the genome-wide distributions of H3K36me3, RNA Pol II 
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Figure 2 | Genome-wide mapping of CeCENP-A-enriched chromatin. 

a, Regions enriched for CeCENP-A and its loading factor KNL-2 ina 
representative portion of chromosome I. For each track, the average z-score 
probe signal of two independent biological replicates is plotted. b, Genome- 
wide correlation plot of CeCENP-A and KNL-2 occupancy. The correlation 
coefficient (r) is in the upper left corner. c, Regions enriched for CeCENP-A 
with the positions of annotated genes. CECENP-A domains were defined by a 
sliding window algorithm. d, Features of CECENP-A domains for individual 
chromosomes. Boxplots: boxes indicate 25th to 75th percentile, whiskers 2.5th 
to 97.5th percentile. Wedges around the medians indicate 95% confidence 
intervals for the medians (see also Supplementary Fig. 5d-f). e, Two 
independent nuclei preparations (Prep) from early embryos (<100 nuclei) 
were blotted alongside a purified CeCENP-A standard (see also Supplementary 
Figs 2c, d and 6a-c). f, Hypothetical model for CeCENP-A permissive domain. 


and CeCENP-A were similar in met-1 and wild-type embryos (Fig. 4a; 
genome-wide correlation coefficients of 0.88, 0.9 and 0.9, respectively). 
However, we observed rare regions of ectopic H3K36me3 enrichment 
in the met-1 mutant (Fig. 4 and Supplementary Figures 11 and 12). Out 
of 132 regions >5kb in size that acquired ectopic H3K36me3 
signal in met-1 mutant embryos, 75 did not show significant RNA 
Pol II occupancy in embryos, suggesting that these regions are mis- 
transcribed in mutant germlines but not in embryos. CeCENP-A was 
depleted from these regions in embryos (Fig. 4a, b and Supplementary 
Figs 11 and 12a). To test if acquisition of H3K36me3 and loss of 
CeCENP-A in met-1 mutant embryos is associated with ectopic germ- 
line gene transcription, we hand-dissected germlines from adult wild- 
type and met-1 mutant worms and measured messenger RNA levels by 
quantitative PCR for nine genes in ectopic H3K36me3 regions. As 
controls, we used eight genes located in regions that did not show a 
change in H3K36me3 signal. Genes in regions with ectopic H3K36me3 
signal indeed showed significantly elevated RNA levels in met-1 
mutant germlines compared to wild type (Fig. 4b and Supplemen- 
tary Fig. 12b). Thus, the data obtained with the met-1 mutant indicate 
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Figure 3 | Relationship between CeCENP-A and gene expression. 

a, Chromosomal region containing the cle-1 gene, which is expressed in 
neurons, and flanking genes that are expressed during embryogenesis. 

b, Genome browser view showing inverse correlation between CeCENP-A and 
RNA Pol II occupancy. ¢, Genome-wide correlation plot of CeCENP-A and 
RNA Pol II occupancy. The correlation coefficient (r) is in the upper right 
corner. d, CeCENP-A, KNL-2, RNA Pol II and H3K36me3 occupancy for 
various gene sets defined on the basis of expression data. The number of genes 
in each set is shown in parentheses. Boxplots (as in Fig. 2d) show the range of 
z-scores averaged over gene bodies. e, CeCENP-A, KNL-2, RNA Pol II and 
H3K36me3 occupancy for the germline-only gene set. f, Genome browser views 
of CeCENP-A, RNA Pol II, and H3K36me3 occupancy on germline-only genes, 
flanked by genes expressed in embryos. 


that ectopic transcription in the germline converts regions from per- 
missive to non-permissive for CeCENP-A incorporation in the 
embryo progeny. 

Whereas the genome-wide inverse relationship to embryonic tran- 
scription suggests the simple model that CeCENP-A deposition is 
random and antagonized by active transcription, such a model fails 
to explain restriction of de-novo-deposited CeCENP-A on transcrip- 
tionally silent sperm chromatin (Fig. 1d), the low CeCENP-A occu- 
pancy on the 169 ‘germline-only’ genes throughout embryogenesis, 
and the results from the met-1 mutant analysis (Supplementary 
Discussion). Thus, we favour the model that transcription in the germ- 
line makes regions non-permissive for CeCENP-A incorporation in 
the progeny (Supplementary Fig. 12c), and the onset of transcriptional 
activity in embryos reinforces and remodels this pattern. 

In C. elegans, H3K36 methylation and the Argonaute CSR-1, which 
binds short 22G-RNAs (named for their 5’ guanosine residue and 22- 
nucleotide length) derived from germline transcripts, are candidate 
mechanisms for transmitting memory of germline transcription to 
early embryos'*’*, Both H3K36 methylation (Fig. 3d-f and Sup- 
plementary Fig. 7c) and CSR-1 22G-RNA targets (Supplementary 
Fig. 13) are inversely correlated with CeCENP-A occupancy, and 
inhibition of CSR-1 and its co-factors leads to early embryonic chro- 
mosome segregation defects!*"°. 

The results here demonstrate that cues unrelated to pre-existing 
CENP-A nucleosomes can dictate the incorporation of new CENP-A 
nucleosomes, challenging the view that centromeres are patterned by 
stable inheritance of CENP-A domains with the key mark for centro- 
mere identity being CENP-A itself. Discontinuity of chromatin- 
localized CENP-A in the germline, similar to the one we describe here 
for C. elegans, has recently been proposed to also occur in plants’’, 
suggesting that removal and reloading of CENP-A during every 
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Figure 4 | Germline expression controls CeCENP-A occupancy in the 
progeny embryos. a, Portion of chromosome I featuring specific regions with 
ectopic H3K36me3 signal in the met-1 mutant (see also Supplementary Figs 11 
and 12). b, Screen shots of the regions boxed in (a). Real-time quantitative 
reverse transcription PCR was performed on hand-dissected wild-type and 
met-1 mutant gonads. Mean met-1:wild-type expression ratio (four 
independent biological replicates each) is listed above genes (see table in 
Supplementary Fig. 12b for all genes analysed). 


generation may be more common than is currently appreciated. In 
addition, the processes of neocentromerization and centromere 
repositioning, which occur with appreciable frequency in humans 
and are observed frequently during evolution’**°, may be guided by 
cues that are linked to transcription. 


METHODS SUMMARY 

For ChIP-chip, chitinase-treated embryos were fixed with 1% formaldehyde in 
PBS for 10 min, suspended in ChIP buffer (50 mM HEPES-KOH pH 7.6, 140 mM 
NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.5% NP-40, 0.1% deoxycholate, 1% sarkosyl), 
and sonicated with a Branson sonifier microtip. Antibody (5 ug) pre-bound to 
50 ul Dynabeads (Dynal Biotech) were incubated for 4h with embryo extract 
(3 mg total protein). Beads were washed and eluted, and the purified DNA amp- 
lified as described’? before hybridization at the Roche Nimblegen Service 
Laboratory (2.1M probe tiling arrays with 50-bp probes; WormBase version 
WS170). Genome-wide scatter plots and Pearson correlations were obtained using 
log, z-scores after median smoothing over 1-kb windows. GFP—CeCENP-A 
images were acquired with a CSU10 spinning disk confocal head (Yokogawa) 
and a CCD camera (iXon DV887; Andor Technology) mounted on a Nikon 
TE2000-E inverted microscope equipped with a solid state laser combiner 
(ALC) 491nm and 561nm lines. For quantification of CeCENP-A in nuclei, 
embryos were treated with chitinase and lysed by douncing in nuclei buffer 
(10mM Tris-HCl pH8, 80mM KCl, 2mM K-EDTA, 0.75mM spermidine, 
0.3mM spermine, 0.1% digitonin). Nuclei were separated from debris by low- 
speed centrifugation steps. For germline expression analysis, total RNA from 50- 
100 dissected gonads was isolated using TRIzol (Invitrogen), and complementary 
DNA was synthesized with Superscript III reverse transcriptase (Invitrogen). 
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Quantitative real-time PCR was performed with iQ SYBR Green Supermix 
(Bio-Rad) in the iQ5 cycler (Bio-Rad) using standard protocols. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Antibodies. ChIP for HCP-3/CeCENP-A was performed with OD79, an affinity- 
purified rabbit polyclonal antibody raised against amino acids 3-183 of CeCENP- 
A’. A second antibody (SDQ0804) raised against amino acids 40-118 of CeCENP- 
A confirmed the ChIP-chip pattern observed with antibody OD79 (genome-wide 
correlation coefficient of 0.92). ChIP for KNL-2 was performed with two 
polyclonal rabbit antibodies (SDQ08003, SDQ08010) raised against amino acids 
207-306. ChIP for RNA Pol II was performed using a monoclonal antibody 
against the CTD repeat YSPTSPS of RNA Pol II (8WG16; abcam ab817mod). 
The antibody for H3K36me3 ChIP was described previously’. Other antibodies 
used for immunoblotting and immunofluorescence were: o-tubulin (DM1o; 
Sigma-Aldrich), NPP-112 and SQV-8 (gifts from J. Audhya), BUB-1 (ref. 21) 
and KNL-2 (ref. 22). 

Worm strains. For monitoring GFP~CeCENP-A and GFP-CPAR-1 localization 
in live adult gonads, sperm and embryos (Fig. 1 and Supplementary Figs 1 and 2), 
and for the photobleaching experiments of GFP-CeCENP-A in Fig. 1, worm 
strains expressing transgenes from a single-copy locus that includes endogenous 
5' and 3’ regions were generated using the MosSCI technique”, as outlined in 
Supplementary Fig. 1b. The GFP-CeCENP-A strain OD347 fully rescued the 
CeCENP-A/hcp-3 deletion allele 0k1892, and all experiments involving GFP- 
CeCENP-A were performed in the 0k1892 deletion background. For imaging, 
the GFP—CeCENP-A (OD347) and GFP-—CPAR-1 (OD588) strains were crossed 
with a strain expressing mCherry—histone H2b (OD56) to generate strains OD421 
and OD416, respectively. 

For the mating-based analysis of CeCENP-A inheritance through fertilization 
(Fig. 1c), wild-type (N2) males were crossed to strain BA17, which harbours a 
temperature-sensitive mutation that abrogates sperm production. Use of the fem-1 
strain ensured that the embryos analysed by immunofluorescence were cross- 
progeny. Injection of double-stranded RNA targeting CeCENP-A and mating with 
males was performed as previously described”. 

For the transcription inhibition assay (Supplementary Fig. 8), GEP-CeCENP- 
A-expressing hermaphrodites (OD347) were mated with his-72p:GFP-H3.3 and 
end-3p:mCherry-H1 co-expressing males (RW10007). 

For the photobleaching analysis in Supplementary Fig. 4, a strain expressing 
GFP-CeCENP-A was generated by bombarding plasmid pJM12 into unc- 
119(de3) worms. In pJM12, a GFP-CeCENP-A transgene is expressed under 
control of the pie-1 promoter and 3’ untranslated region (UTR). The GFP is 
inserted at amino acid 174 between the amino-terminal tail and histone core 
(Supplementary Fig. 4a). Coding sequence and introns preceding amino acid 
134 were altered to preserve coding information, but make the transgene-encoded 
mRNA resistant to RNAi. The two introns in this part of the CeCENP-A locus were 
replaced with introns from SPhased GFP (Fire lab 2005 vector kit). Ballistic 
bombardment of pJM12 generated strain OD136. The transgene insertion in 
OD136 cannot be homozygosed—there is a low amount of embryonic lethality 
(12-14%) and Unc progeny. This is probably due to the transgene insertion site, as 
both of these phenotypes segregated with the GFP fluorescence through multiple 
outcrosses. A dsRNA to the re-encoded region was used to selectively deplete 
endogenous CeCENP-A and assess functional rescue by the transgene 
(Supplementary Fig. 4b). OD136 was used to generate strain OD265, where one 
copy of the endogenous CeCENP-A locus is deleted and GFP-CeCENP-A as well 
as mCherry—H2b are co-expressed. Photobleaching experiments in Supplemen- 
tary Fig. 4d, e were performed using OD265. Strain genotypes are listed in 
Supplementary Table 1. 

RNA interference. L4 worms were injected with dsRNA prepared as described 
previously”’ and incubated for 48h at 20°C, except for the mating experiment 
using the fem-1 mutant (see above). 

N2 genomic DNA was used as a template to generate PCR products for dsRNA 
production. The dsRNA for depletion of CeCENP-A in the inheritance experi- 
ment of Fig. 1c was described previously’. 

Oligonucleotides for production of dsRNA against the re-encoded sequence of 
CeCENP-A used in the rescue experiments with strain OD136 (Supplementary 
Fig. 4a, b): O0D1887, 5'-AATTAACCCTCACTAAAGGgccgatgacaccccaattat-3’; 
00D1888,5’-TAATACGACTCACTATAGGccgtgggagtaatcgacaag-3’. Oligonucleotides 
for dsRNA against GFP: 00D2423, 5'-TAATACGACTCACTATAGGgtcagtgga 
gagegtggaagestg-3'; o0D2424, 5’-AATTAACCCTCACTAAAGGcatgccatgtgt 
aatcccagcage-3’. 

For replication inhibition (Supplementary Fig. 3), dsRNAs targeting cdc-6 and 
cdt-1 were mixed to obtain equal concentrations. Oligonucleotides used for 
dsRNA against cdc-6: 00D1265, 5'-AATTAACCCTCACTAAAGGCAAATTC 
CTGCTGCTCCAAT-3’; 00D1266, 5’-TAATACGACTCACTATAGGCGGT 
CGAACCTCAAGTTCAT-3’. Oligonucleotides used for dsRNA against cdt-1: 
o0D801, 5’-AATTAACCCTCACTAAAGGCAAAAACAACGAAGCGTGTG-3'; 
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00D802, 5'-TAATACGACTCACTATAGGCCTCGTTTTTCATTTTATCATT 
CA-3’, 

Immunofluorescence and immunoblotting. Embryos were fixed and processed 
for immunofluorescence as described previously’. Antibodies directly labelled 
with fluorescent dyes (Cy2, Cy3 or Cy5; Amersham Biosciences) were used at 
1ygml~'. Images were recorded on a DeltaVision microscope at 1 X 1 binning 
with a X100 numerical aperture (NA) 1.3 U-planApo objective (Olympus). 
Z-stacks (0.2-um sections) were deconvolved using softWoRx (Applied 
Precision), and maximum intensity projections were imported into Adobe 
Photoshop CS4 for further processing. 

Immunoblotting was performed using standard methods. For CeCENP-A 

immunoblots (Figs 1b and 2e), proteins were transferred for 5h at 30V in 
25mM Tris-HCl pH 8.3, 192 mM glycine, 20% methanol. These blotting condi- 
tions were optimized to result in quantitative transfer of CeCENP-A onto the 
membrane. 
Live imaging and photobleaching. All live imaging was performed at 20 °C. For 
images of adult hermaphrodite gonads (Fig. la and Supplementary Fig. 2a, b) 
worms were anesthetized with a mixture of 1mgml~' ethyl 3-aminobenzoate 
methanesulphonate and 0.1 mgml’ of tetramisole hydrochloride in M9 for 
15-30 min before transferring them to an 2% agarose pad under a coverslip. 
Images of gonad regions were acquired with a X40 1.3 NA PlanFluor objective 
by collecting an 80 X 0.541m Z-series of GFP and mCherry images for every 
Z-plane. The whole gonad views shown in Fig. 1a and Supplementary Fig. 2a were 
stitched together from three individual, overlapping images. Embryos 
(Supplementary Fig. 1c) and —1 oocytes/spermatheca (Supplementary Fig. 2b) 
were imaged using a X100 1.4 NA PlanApochromat objective. Images were 
acquired with 1 X 1 binning on a spinning disk confocal setup mounted on a 
Nikon TE2000-E inverted microscope equipped with a solid-state laser combiner 
(ALC) (491 nm and 561 nm lines0, a Yokogawa CSU10 head and a CCD camera 
(iXon DV887; Andor Technology). Acquisition parameters, shutters and focus 
were controlled by iQ 1.10.0 software (Andor Technology). Images were processed 
with Fiji 1.0 and Adobe Photoshop CS4. 

For the transcription inhibition assay (Supplementary Fig. 8), cross-progeny 
embryos from GFP-CeCENP-A expressing hermaphrodites mated with his- 
72p:GFP-H3.3 and end-3p:mCherry-H1 co-expressing males were dissected in 
L-15 blastomere culture medium” containing 200 jig ml! «.-amanitin. GFP and 
mCherry Z-stacks of permeable and impermeable embryos in the same field of 
view were acquired at 1-4 min intervals with a X60 1.4 NA PlanApochromat 
objective until embryos contained more than 50 cells. 

The photobleaching experiments in Fig. 1d, e were performed with the FRAPPA 
unit (Andor Technology) using a X60 1.4 NA PlanApochromat objective. 
Thirteen Z-sections were acquired with a spacing of 1 jum before and after bleach- 
ing in the first embryonic division and at 1-min intervals thereafter until anaphase 
of the second division. Maximum intensity projections were generated for each 
Z-stack. For each image sequence, an identical sized rectangle (R1) was drawn 
around each anaphase chromatid set before and after photobleaching in the first 
division and around anaphase chromatid sets in the second division. A larger 
rectangle (Rb) was drawn around each rectangle Rl and the area between the 
two rectangles served as a measure of background intensity. The average intensity 
(Avg. Int.) of the GFP signal in each R1 was measured using the formula: Avg. 
Int.; - [(Avg Int.g, X Areagn — Avg. Int.p; X Areag;)/(Areagn — Areap;)], and 
the ratio of average intensities on anaphase chromatid sets before/after bleaching 
and in the subsequent anaphase were calculated and averaged. 

For the photobleaching experiments in Supplementary Fig. 4d, e, the micro- 
scope setup differed from the one used for the experiments in Fig. 1 as follows: the 
microscope was equipped with a krypton-argon 2.5W water-cooled laser 
(Spectra-Physics), acquisition parameters, shutters and focus were controlled by 
MetaMorph software (MDS Analytical Technologies), and the 488 nm laser line 
for photobleaching was steered into a custom-modified epifluorescence port. 
GFP-CeCENP-A intensity ratios were calculated as described above, except that 
anaphase chromatid sets before/after bleaching in the first division were compared 
with metaphase plates in the second division. 

Expression analysis on dissected germlines by quantitative PCR. Worms con- 
taining one or two embryos were dissected with 30-gauge needles in Egg buffer 
(25mM HEPES-KOH pH7.6, 118mM NaCl, 48mM KCl, 2mM CaCl, 2mM 
MgCl,,) containing 1 mM levamisole and 0.5% Tween-20. Total RNA from 50- 
100 gonads was isolated using TRIzol (Invitrogen), and cDNA was synthesized 
using Superscript III reverse transcriptase (Invitrogen). Quantitative real-time 
PCR was performed with iQ SYBR Green Supermix (Bio-Rad) in the iQ5 cycler 
(Bio-Rad) using standard protocols. The average amplification efficiency (E) of 
primer pairs (Supplementary Table 2) was calculated from two standard curves 
(tenfold dilution series of cDNA prepared from mixed-stage N2 worms). The 
relative transcript abundance (RTA) of target genes in met-1 mutant (met-1) 
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versus wild-type (WT) germlines was calculated after normalization to the 
(Cw ~Ctmet—1) 
target 

reference 


reference gene act-2 (actin homologue), using the formula: RTA= 


where C; denotes the threshold cycle. The assay was performed in duplicate on 
four biological replicates each for wild-type and met-1 mutant germlines. 
Quantification of CeCENP-A molecules in purified nuclei and sperm. 6xHis- 
CeCENP-A was expressed in bacteria, purified by nickel affinity chromatography 
under denaturing conditions followed by electroelution using the Bio-Rad Electro- 
Eluter. The concentration of the purified protein was measured relative to BSA on 
a gel. Purified sperm were a gift of S. Ward. Sperm concentration was measured by 
microscopy following 4’ ,6-diamidino-2-phenylindole (DAPI) staining. 

For isolation of nuclei, early embryos (<100 cells) were harvested from syn- 
chronized adult worms and treated with chitinase as described below for embryo 
extract preparation. Packed embryos (1 ml) were washed with 2 X 50 ml chilled 
Egg buffer (25mM HEPES-KOH pH7.6, 118mM NaCl, 48mM KCl, 2mM 
CaCl, 2mM MgCl,), hypotonically swollen for 15 min in 10 ml of 0.5X Nuclei 
buffer (5 mM Tris-HCl pH 8, 40 mM KCl, 1 mM K-EDTA, 0.375 mM spermidine, 
0.15 mM spermine), then washed into 10 ml of 1X Nuclei buffer (10 mM Tris-HCl 
pH 8, 80 mM KCl, 2mM K-EDTA, 0.75 mM spermidine, 0.3 mM spermine) sup- 
plemented with 0.1% digitonin (Sigma-Aldrich) and protease inhibitors, and 
immediately dounced with about 50 strokes in a 15-ml Wheaton Dounce homo- 
genizer using pestle B. Large debris was pelleted at 100g for 3 min and re-dounced 
once as above. Supernatants containing the nuclei were combined and spun at 
2,000g for 15 min. The nuclei pellet was suspended in 1 Nuclei buffer supple- 
mented with 0.1% digitonin and protease inhibitors and layered onto a 30% (w/v 
in Nuclei buffer + 0.1% digitonin) sucrose cushion. Nuclei were recovered in the 
pellet after spinning at 2,000g for 15 min. 

Embryo isolation, fixation and extract preparation. N2 adult worms were 
grown from synchronized L1 larvae in S-basal medium. Batches of 500 ml in 
2.8-1 Fernbach flasks shaking at 230 r.p.m. were incubated at 17 °C (early embryos) 
or 19°C (late embryos) for approximately 65h. The exact time of harvest was 
determined by checking embryo production under a microscope (for early 
embryos, this was five embryos per worm or less). Gravid adults were separated 
from debris by sucrose floating, and embryos were recovered by dissolving adults 
with a bleach/NaOH solution. 10 pl and 2 kl of packed embryos were set aside for 
expression profiling and staging by fluorescence microscopy after DAPI staining, 
respectively. The remainder was suspended in 2 volumes of Egg buffer (25 mM 
HEPES-KOH pH 7.6, 118 mM NaCl, 48 mM KCl, 2 mM CaCl, 2 mM MgCl) and 
incubated with 0.15 units ml~’ chitinase (Sigma-Aldrich) until visible disinteg- 
ration of the eggshell. Embryos were washed with 2 50 ml chilled phosphate- 
buffered saline (PBS) and suspended in 40 ml PBS. Fixation was performed for 
10min on ice after adding 4ml of cross-linking solution (11% formaldehyde, 
50mM HEPES-KOH pH 8, 0.1 M NaCl, 1mM Na-EDTA, 0.5mM Na-EGTA), 
and excess formaldehyde was quenched with 120 mM glycine. Fixed embryos were 
washed with 3 X 50 ml PBS and suspended in five pellet volumes of ChIP buffer 
(50 mM HEPES-KOH pH 7.6, 140 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.5% 
NP-40, protease inhibitors). Sonication was performed using a Branson sonifier 
microtip in cycles of 10 s duration (0.9 s on, 0.1 s off) with the power setting at 25% 
(2 cycles), 30% (2 cycles), 35% (10 cycles), 40% (2 cycles), and 45% (2 cycles), anda 
pause of 1 min between cycles. Crude extracts were spun for 20 min at 10,000g, the 
supernatant was removed and glycerol added to 10%. Protein concentration was 
determined by the Bradford method and aliquots of 3 mg protein were flash- 
frozen in liquid nitrogen. 

Chromatin immunoprecipitation (ChIP). Extract corresponding to 3mg 
protein was diluted to 900 ul with ChIP buffer. After addition of sarcosyl to 1%, 
Na-deoxycholate to 0.1%, and PMSF to 1 mM, the extract was spun for 10 min at 
maximum speed in a tabletop centrifuge and the supernatant was used for ChIP. 
50 pil was removed for preparation of input DNA. To the rest, 50 ul of Dynabead 
Sheep anti-Rabbit or anti- Mouse IgG suspension (Dynal Biotech), pre-bound to 
5 tg of target antibody, were added, and the mixture was incubated at 4 °C for 4h 
or overnight (<16h). Beads were recovered with a Dynal Magnetic Particle 
Concentrator (Invitrogen) and washed 25min with buffer FA (50mM 
HEPES-KOH pH7.6, 150mM NaCl, 1mM EDTA, 1% Triton X-100, 0.1% Na- 
deoxycholate), 10 min with FA-1000 (50mM HEPES-KOH pH7.6, 1M NaCl, 
1mM EDTA, 1% Triton X-100, 0.1% Na-deoxycholate), 10 min with FA-500 
(50 mM HEPES-KOH pH7.6, 500mM NaCl, 1mM EDTA, 1% Triton X-100, 
0.1% Na-deoxycholate) in a new tube, 10min with TEL buffer (10mM Tris- 
HCl pH 8.0, 0.25M LiCl, 1mM EDTA, 1% NP-40, 1% Na-deoxycholate), and 
once briefly with TE (10mM Tris pH8, 1mM EDTA). Antibody-bound 
chromatin was recovered in 50 ul Elution buffer (10 mM Tris-Cl pH 8.0, 1mM 
EDTA, 250 mM NaCl, 1% SDS) by shaking at 67 °C for 15 min. Input and ChIP 
chromatin samples were subsequently processed in parallel. Cross-links were 
reversed over night at 65 °C in Elution buffer, proteins were digested with 0.45 mg 


ml! proteinase K for 2h at 37°C, and nucleic acids were recovered by phenol/ 
chloroform extraction and precipitation with ethanol. RNA was digested with 
0.3 mg ml! RNase A in TE at 37°C for 2h, and DNA purified with a column 
(PCR Purification Kit, Qiagen). 

The ChIP procedure for the met-1 mutant analysis (Fig. 4 and Supplementary Figs 

11 and 12) differs slightly from the above and was described in detail previously’’. 
Ligation-mediated PCR. DNA ends were blunted with 5 unitsml”’ T4 DNA 
polymerase (New England Biolabs) at 12°C for 20 min and the DNA recovered 
by phenol/chloroform extraction and ethanol precipitation. Annealed oligomer 
adaptors (oligol: 5’-GCGGTGACCCGGGAGATCTGAATTC-3’; oligo2: 5’- 
GAATTCAGATC-3’) were ligated to blunt DNA ends at 2 1M with 4,000 units 
ml! T4 DNA ligase (New England Biolabs) overnight at 16 °C, and DNA was 
precipitated with ethanol. DNA fragments were amplified for 22 cycles (55°C, 
2 min; 72°C, 5 min; 95°C, 2 min; 95°C, 1 min; 60°C, 1 min; 72 °C, 2 min; start 
cycle again at step 4) using 1 uM oligo 1 and a Tag DNA polymerase (100 units 
ml_')/Pfu DNA polymerase (0.5 units ml ') mix. Amplified DNA was purified 
with a column (PCR Purification Kit, Qiagen). 
Microarray hybridizations, data analysis and display. Amplified ChIP DNA 
was labelled and hybridized by the Roche Nimblegen Service Laboratory. 2.1M 
probe tiling arrays, with 50-bp probes, designed against WormBase version 
WS170 (ce4) were used for all experiments. ChIP samples were labelled with 
Cy5 and their input reference with Cy3. One ChIP was dye-swapped, which 
resulted in the same pattern (not shown). For each probe, the intensity from the 
sample channel was divided by the reference channel and transformed to log,. The 
enrichment scores for each replicate were calculated by standardizing the log ratios 
to mean zero and standard deviation one (z-score). Genome-wide scatter plots and 
Pearson correlations between all ChIP targets and replicates were obtained using 
all probe z-scores after median smoothing over 1-kb windows. 

The average z-score of two replicates was used for all analyses, except in 
Supplementary Figs 9, 10 and 13c, where individual data sets from extracts with 
distinct age distributions of embryos are compared. Accession numbers for data 
sets used in this study are listed in Supplementary Table 3. Scatter plots and 
boxplots for genes were generated by averaging z-scores of probes located com- 
pletely within the transcript start site (TSS) and end site (TES). TSS and TES 
coordinates were obtained from WormBase (WS170). 

Gene body profile plots (Supplementary Fig. 7b) were generated by aligning 
genes of length greater than 2 kb at their TSS and TES. The genomic regions 1.5 kb 
upstream to 1kb downstream from TSS and 1kb upstream to 1 kb downstream 
from TES were divided into 50-bp bins, and probes were assigned to the nearest 
bin. Gene group profiles were generated by averaging probe z-scores within each 
bin across genes in the group. 

Definition of CeCENP-A-enriched domains. CeCENP-A signal was averaged 
over 2-kb windows, every 50 bp. A random distribution for the window averages 
was obtained by randomly sampling and assigning CeCENP-A values for each 
chromosome. The resulting random CeCENP-A tracks were also averaged over 
2-kb windows. A cutoff was selected so that the number of random window 
averages above the cutoff was less than 3% of the number of non-randomized 
windows above the cutoff, effectively providing a 3% false positive rate with respect 
to the random window averages. Overlapping windows above the cutoff were 
combined into domains. Domains with gaps smaller than 2kb were merged, 
and domains smaller than 2.5 kb were excluded. 

Definition of gene classes based on expression profiling data sets. Gene classes 
were defined on the basis of expression data, as described previously’. In brief, 
‘Ubiquitous’ or housekeeping have transcripts present in muscle, gut, neuron and 
adult germline SAGE (serial analysis of gene expression) data sets”””*; ‘Germline- 
expressed’ genes have transcripts present in the dissected adult hermaphrodite 
germ line SAGE data set’’; ‘Serpentine’ receptor genes are expressed in mature 
neurons and silent in embryos”; ‘Spermatogenesis’ genes are classified as 
expressed during sperm production on the basis of comparative microarray ana- 
lysis*°; ‘Germline-only’ genes are expressed exclusively in the maternal germ line, 
as their transcripts are enriched in the germline, maternally loaded into 
embryos"’, and absent from muscle, gut and neuron SAGE data sets*”**; tran- 
scripts of ‘Embryo-expressed’ genes are not maternally provided and increase in 
level during embryogenesis". 

Criteria for identifying genes with maximal changes in RNA Pol II levels. 
Genes with maximal changes in RNA Pol II levels between the two averaged early 
embryo (EE) and late embryo (LE) extracts (Supplementary Fig. 10a, b), were 
identified by applying a moderated t-test*' and requiring a false discovery rate 
smaller than 5% (ref. 32). In addition, RNA Pol II levels for those genes were 
required to show at least a twofold change in RNA Pol II ChIP-chip hybridization 
signal between the averaged EE and LE extracts. 

Transcriptional profiling of embryos. RNA was isolated from 10 ul of packed 
embryos using TRIzol (Invitrogen) and the RNeasy kit (Qiagen). RNA (20 1g) was 
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hybridized to a single-colour 4-plex Nimblegen expression array with 72,000 
probes (three 60-mer oligo probes per gene). Quantile normalization*’ and the 
robust multichip average (RMA) algorithm™ were used to normalize and sum- 
marize the multiple probe values per gene to obtain one expression value per gene 
and sample. The expression values per gene were averaged across samples as 
indicated in the figure legends. 
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Hsp72 preserves muscle function and slows 
progression of severe muscular dystrophy 


Stefan M. Gehrig", Chris van der Poel!+, Timothy A. Sayer’, Jonathan D. Schertzer'+, Darren C. Henstridge’, Jarrod E. Church't, 
Severine Lamon’, Aaron P. Russell*, Kay E. Davies*, Mark A. Febbraio? & Gordon S. Lynch! 


Duchenne muscular dystrophy (DMD) is a severe and progressive 
muscle wasting disorder caused by mutations in the dystrophin 
gene that result in the absence of the membrane-stabilizing protein 
dystrophin’ *. Dystrophin-deficient muscle fibres are fragile and 
susceptible to an influx of Ca”*, which activates inflammatory and 
muscle degenerative pathways**. At present there is no cure for 
DMD, and existing therapies are ineffective. Here we show that 
increasing the expression of intramuscular heat shock protein 72 
(Hsp72) preserves muscle strength and ameliorates the dystrophic 
pathology in two mouse models of muscular dystrophy. Treatment 
with BGP-15 (a pharmacological inducer of Hsp72 currently in 
clinical trials for diabetes) improved muscle architecture, strength 
and contractile function in severely affected diaphragm muscles in 
imdx dystrophic mice. In dko mice, a phenocopy of DMD that results 
in severe spinal curvature (kyphosis), muscle weakness and 
premature death”*, BGP-15 decreased kyphosis, improved the 
dystrophic pathophysiology in limb and diaphragm muscles and 
extended lifespan. We found that the sarcoplasmic/endoplasmic 
reticulum Ca**-ATPase (SERCA, the main protein responsible for 
the removal of intracellular Ca**) is dysfunctional in severely 
affected muscles of mdx and dko mice, and that Hsp72 interacts with 
SERCA to preserve its function under conditions of stress, ultimately 
contributing to the decreased muscle degeneration seen with Hsp72 
upregulation. Treatment with BGP-15 similarly increased SERCA 
activity in dystrophic skeletal muscles. Our results provide evidence 
that increasing the expression of Hsp72 in muscle (through the 
administration of BGP-15) has significant therapeutic potential 
for DMD and related conditions, either as a self-contained therapy 
or as an adjuvant with other potential treatments, including gene, 
cell and pharmacological therapies. 

DMD is the most severe form of muscular dystrophy; it affects about 1 
in 3,500 live male births!. Intracellular Ca?* regulation is compromised 
in dystrophic muscle fibres, which triggers chronic inflammation, 
repeated cycles of degeneration with progressively ineffective regenera- 
tion, and infiltration of fibrotic and other non-contractile material*. 
Mechanisms for the influx of Ca** into dystrophic muscle fibres include 
membrane tears‘, stretch-activated channels’, Ca?* leak channels!® 
and leaky Ca’* release channels'', and it has been speculated that the 
function of SERCA, the main protein responsible for Ca” reuptake into 
the sarcoplasmic reticulum (SR), is compromised’*'*. Increasing 
SERCA pump expression within dystrophic muscles in transgenic mice 
or through viral-mediated delivery improves Ca** handling and sup- 
presses the pathological cascade of events’*”*. The role of inflammation 
in the dystrophic pathology is well known, particularly that of the pro- 
inflammatory cytokine tumour necrosis factor-a (TNF-«)'®. TNF-a 
activates the nuclear factor-kB (NF-«B) and c-Jun N-terminal kinase 
(JNK) signalling pathways'”’’. Hsp72 is a molecular chaperone protein 
that inhibits inflammatory mediators including p-JNK, TNF-« and the 


NF-«B pathway’, and binds and preserves SERCA function under 
conditions of cellular stress’. Although some studies have shown 
Hsp72 to be elevated in patients with DMD, there is little consensus, 
because expression data for young patients are variable and sourcing 
age-matched controls is problematic (Supplementary Fig. 1a). 
Nevertheless, the endogenous heat shock protein response in DMD 
is insufficient to be protective. Here we tested the hypothesis that 
increasing the levels of Hsp72 protects dystrophic muscles from func- 
tional deterioration. We bred dystrophin-null mdx dystrophic mice 
with mice showing a muscle-specific transgenic (TG) overexpression 
of Hsp72, producing mdx'°™? mice and mdx littermate controls 
(Fig. 1a; see Supplementary Fig. 1b for quantification). At about 25 
weeks of age, serum levels of creatine kinase (CK), a classic indicator of 
muscle breakdown, were decreased in mdx'S™ mice compared with 
littermate control mice lacking the transgene (mdx'°~’; Fig. 1b). 
Because most patients with DMD show severe weakness* and/or 
muscle fatigue’, we assessed whole-body strength and endurance in 
dystrophic mice by using a hang test to measure latency-to-fall, 
which was significantly improved in mdx'°™? mice (24 + 3s versus 
64+ 14s; P= 0.002, n= 20 mice). Respiratory failure is the cause of 
death in up to 90% of patients with DMD”, and because diaphragm 
function is an accurate predictor of respiratory insufficiency we inves- 
tigated the effect of Hsp72 overexpression on the pathophysiology of 
the diaphragm in dystrophin-deficient mice. The progressive degen- 
eration of the diaphragm in mdx mice closely mimics that in DMD”. 
Gross histological analyses revealed that the diaphragm pathology 
in mdx'® mice was ameliorated compared with age-matched 
littermate control mdx'S™ mice (Supplementary Fig. 1c), an obser- 
vation supported by the minimal Feret’s diameter variance coefficient, 
which provides a sensitive measure of fibre heterogeneity and the 
dystrophic pathology'®. We found a lower Feret’s diameter variance 
coefficient in mdx'°™ mice than in mdx'S” mice, indicative of an 
improved phenotype (405+5 versus 358+5; P<0.001, n=5). 
Damaged myofibres can be revealed by the infiltration of Evans blue 
dye (EBD) entering the myoplasm through tears in the sarcolemma’*®. 


(Fig. 1c and Supplementary Fig. 1d), indicative of decreased necrosis 
and hastened overall repair of damaged fibres rather than improved 
structural integrity. To support this contention, we performed well- 
described in situ and in vitro contraction-induced injury protocols on 
tibialis anterior (TA) and diaphragm muscles, respectively (see 
Methods). No differences in contraction-mediated damage were 
evident between muscles of mdx'@ and mdx'’@™ mice (Sup- 
plementary Fig. le), indicating that structural integrity was unaltered. 
Expression of the dystrophin homologue, utrophin, a protein known to 
compensate for the loss of dystrophin, was also unchanged (Sup- 
plementary Fig. 1f). We assessed collagen infiltration in sections of 
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Figure 1 | Transgenic Hsp72 overexpression increases muscle strength, 
decreases muscle breakdown and improves diaphragm muscle histological 
parameters in mdx mice. a, Representative western blot detection of Hsp72 in 
diaphragm muscle homogenates from non-dystrophic (WT) and dystrophic 
(mdx) Hsp72 transgene-negative (TG—) and transgene-positive (TG+) mice. 
GAPDH, glyceraldehyde-3-phosphate dehydrogenase. b, Whole-body muscle 
breakdown, measured by serum creatine kinase (CK) levels. Asterisk, P< 0.05, 
n= 6. c, Quantified mean data for EBD-positive area in diaphragm muscle 
sections. Asterisk, P< 0.05, n = 4. d, Representative images of collagen 
infiltration in diaphaae sections (revealed with Van Gieson’s stain) from 
mdx'S™ and mdx'°™ mice. e, Specific (normalized) force of diaphragm 
muscle strips measured in vitro. Asterisk, P< 0.05, n = 5. All data are from 25- 
week-old mice. Scale bar, 200 tim. Data are shown as means + s.e.m. 
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diaphragm and found that mdx'@ mice had less collagen infiltration 


than mdx'°” mice at both 25 weeks of age (Fig. 1d and Supplemen- 
tary Fig. 1g) and 80 weeks of age (data not shown). Normalized force 
production was significantly higher in diaphragm muscle strips from 
mdx'@) mice than in those from mdx’S™ mice (Fig. le). 

We have shown that Hsp72 can block inflammation in mice 
in vivo’, and because inflammation promotes muscle degeneration 
in mouse models of DMD” we examined the effect of Hsp72 on 
expression of p-JNK, p-IKK (a key mediator of NF-«B activation) 
and TNF-«. Western blot and polymerase chain reaction (PCR) 
analyses revealed no difference in these inflammatory markers in dia- 
phragm muscles from mdx'C™” and mdx'S™ mice (Supplementary 
Fig. 2, and data not shown). There was significantly decreased 
messenger RNA expression of macrophage markers CD68 and F4/ 
80 and TNF-« in TA muscles of mdx'’“*) mice (Supplementary 
Fig. 3), but these did not translate to functional improvements (data 
not shown). We next examined SERCA activity in diaphragm muscles 
from wild-type (WT) and mdx mice and showed a progressive age- 
related decline in maximal SERCA activity in mdx mice (Fig. 2a and 
Supplementary Fig. 4a), despite an age-related increase in SERCA 
protein expression (Supplementary Fig. 4b). This functional decline 
is attributed, in part, to post-translational modifications of the SERCA 
protein, especially nitrosylation, which decreases maximal SERCA 
activity as a result of changes in Ca’ -binding and ATP-binding 
domains”. Alterations in the Ca”* -binding domain changes SERCA 
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Figure 2 | Maximal SERCA activity is decreased in mouse models of 
dystrophy; Hsp72 binding improves SERCA function. a, Maximal SERCA 
activity in diaphragm muscle homogenates from WT and mdx mice at 3, 10 and 
30 weeks of age. Asterisk, P< 0.001, n = 4. b, Maximal SERCA activity of 
muscle homogenates from quadriceps and diaphragm from 10-week-old WT 
and dko mice. Asterisk, P = 0.036. Two asterisks, P< 0.001, n = 4. c, Maximal 
SERCA activity in muscle homogenates from mdx'S™ (pale grey bars) and 
mdx'S™ (dark grey bars) mice. Asterisk, P< 0.008, n= 9. d, Cat 
accumulation curves (an indirect measure of SERCA activity) for SR in single 
fibres from extensor digitorum longus muscles of mdx'S™ and mdx'“™ 
mice. Asterisk, P< 0.01, n = 15. e, Maximal SERCA activity after incubation 
for 5 min at various concentrations of peroxynitrite (ONOO ), measured in 
enriched SR vesicles isolated from muscles of mdx’°™ and mdx'S, SERCA 
activity was normalized to maximal activity in the absence of ONOO-. 
Asterisk, P < 0.05, n = 3. f, Representative western blots of enriched SR vesicles 
© and mdx'* (n = 4) mice, showing Hsp7? protein expression 
and Coomassie blue stain (showing SERCA isoforms). Data are shown as 
means + s.e.m. 


enzyme kinetics, decreasing Ca** sensitivity and increasing [Ca”* ]s0 
(the [Ca?*] required to achieve half-maximal enzyme activity), an 
effect we observed in diaphragm muscles from 30-week-old mdx mice 
(Supplementary Fig. 4c). Similar deficits in SERCA activity were evid- 
ent in both limb and diaphragm muscles of severely affected dko mice 
(Fig. 2b), indicating that abnormal SERCA function may contribute to 
the disruptions in Ca** regulation characteristic of dystrophic 
muscles. Indeed, recent evidence suggests that closer regulation of 
Ca** homeostasis through enhanced SERCA expression or activity 
significantly suppresses the degeneration of dystrophic muscle’*”’. 
Because Hsp72 binds SERCA and prevents functional inactivation 
under conditions of cellular stress”, we next tested the hypothesis that 
overexpression of Hsp72 in dystrophic muscles would improve SERCA 
function. We examined maximal SERCA activity in homogenates of 
quadriceps muscles, and found an increase in activity in mdx'O™ 
compared with mdx'“ mice (Fig. 2c). To support this finding we 
also examined Ca?* accumulation in SR (an indirect measure of 
SERCA activity) in single muscle fibres dissected from fast-twitch 
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extensor digitorum longus, and predominantly slow-twitch soleus 
muscles, and found a similar increase in maximal SERCA activity in 
individual muscle fibres in mdx'S compared with mdx'S™ mice 
(as for whole-muscle homogenates), with no changes in mRNA for 
SERCA or in protein expression (Fig. 2d, Supplementary Fig. 5 and 
Supplementary Table 1). We then examined whether Hsp72 over- 
expression could protect SERCA function under conditions of 
stress, like that induced by reactive oxygen-nitrogen species such as 
peroxynitrite (ONOO ). We developed a SERCA activity assay in 
which enriched SR vesicles isolated from mdx'@ and mdx' ° mice 
were incubated with increasing concentrations of ONOO ; the sub- 
sequent suppression in activity was normalized to that in the absence of 
ONOO . We found that SERCA activity was greater in the presence of 
various ONOO concentrations in enriched SR vesicles from 
max compared with mdx'S mice (Fig. 2e), with western blots 
revealing Hsp72 protein levels were highly elevated in enriched SR 
vesicles from mdx'°*) mice (Fig. 2f; see Supplementary Fig. 6 for full 
blots), indicating that Hsp72 was bound within the SR to mediate this 
protective effect. These data indicate that Hsp72 overexpression in 
dystrophic muscles can protect SERCA from inact etng modifica- 
tions and is a likely mechanism of protection in mdx'@? mice. 

Given the significant phenotypic improvements in mdx'°"? mice, 
especially in the diaphragm, we examined whether similar effects could 
be achieved through the pharmacological or heat-therapy induction of 
Hsp72. BGP-15 is a pharmacological inducer of Hsp72 that can protect 
against obesity-induced insulin resistance” and is in Food and Drug 
Administration (FDA)-approved Phase IJ clinical trials for diabetes’. 
Dystrophic mdx mice were treated from 4 to 9 weeks of age (5 weeks) 
or from 4 to 16 weeks of age (12 weeks) with BGP-15 (15mgkg ' per 
day, oral gavage). WT mice were used for comparisons with mdx mice. 
Hsp72 protein expression was elevated in diaphragm muscle homo- 
genates from BGP-15-treated mdx mice (and heat-therapy-treated 
mice) compared with untreated mdx mice (Fig. 3a; see Supplemen- 
tary Fig. 7a for quantification). CK levels were decreased in BGP-15- 
treated mdx mice compared with untreated mdx mice (Fig. 3b). EBD 
infiltration was also reduced with long-term BGP-15 treatment 
(Fig. 3c). Strength and endurance was evaluated with the inverted hang 
test; WT mice were stronger than mdx mice, and BGP-15-treated mdx 
mice showed an increased latency-to-fall compared with control 
(Supplementary Fig. 7). An increase in fibrosis, as seen in DMD‘, 
was observed in the mdx diaphragm (compared with WT), and 
BGP-15 treatment significantly decreased fibrosis (Fig. 3d; see 
Supplementary Fig. 7 for quantification). Treatment with BGP-15 
attenuated the functional deterioration of the diaphragm muscle sig- 
nificantly (Fig. 3e). Maximal SERCA activity in diaphragm homoge- 
nates was increased in mdx mice after long-term treatment with 
BGP-15, indicating a mechanism consistent with that of transgenic 
Hsp72 overexpression (Fig. 3f). Because elevated core temperature is 
a potent inducer of heat shock proteins”, we also extensively tested 
this method of heat shock protein induction (see Methods). Similar 
beneficial effects to those observed with transgenic Hsp72 overexpres- 
sion and BGP-15 treatment were seen in mdx mice exposed to repeated 
heat therapy (see Supplementary Fig. 8). 

We then investigated whether treatment with BGP-15 was protective 
in severely affected dystrophic dko mice, the most phenotypically 
accurate murine model of DMD”* (see Methods). The dko mice were 
treated with BGP-15 from 3-4 weeks until 10 weeks of age. For the 
survival study, dko mice were treated from 3-4 weeks until death or 
humane killing (as described in Methods). Photographs of control and 
BGP-15-treated dko mice were taken at 10 weeks, immediately before 
killing, and after evisceration and staining of the skeleton with alizarin 
red (Fig. 4a). Data from WT mice (as in Fig. 3) were also used for 
comparisons with dko mice. Because boys with DMD have significant 
paraspinal muscle weakness and in many cases severe kyphosis (spinal 
curvature)**, this was quantified in conscious mice (from the 10-week 
endpoint cohort) by a blinded investigator using a 1-5 index of 
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Figure 3 | Pharmacological induction of Hsp72 ameliorates muscular 
dystrophy in mdx mice. a, Representative western blot detection of Hsp72 in 
diaphragm muscle homogenates of control (Con.) and BGP-15-treated mdx 
mice. b, Whole-body muscle breakdown, measured by serum CK levels in WT 
(C57BL/10), mdx (control) and BGP-15-treated mdx mice. Asterisk, P< 0.01, 
n= 8.c, Quantified mean data for EBD infiltration in mdx mice treated for 12 
weeks. Asterisk, P< 0.05, n = 8. d, Representative images of collagen 
infiltration in diaphragm muscle sections. e, Specific force production in 
diaphragm muscle strips measured in vitro. Asterisk, P< 0.01, n= 9. 

f, Maximal SERCA activity in diaphragm muscle homogenates from mdx mice 
treated for 12 weeks. Asterisk, P< 0.05, n = 6. Scale bar, 200 um. WT data are 
used as a reference control. All treated mice received BGP-15 for 5 weeks unless 
stated otherwise. Data are shown as means + s.e.m. 


kyphosis, 1 indicating no spinal deformity on palpation and 5 being 
the most severe. Treatment with BGP-15 decreased kyphosis markedly 
compared with vehicle-treated controls (Fig. 4b). Serum CK levels were 
significantly lower in dko mice after treatment with BGP-15 (Fig. 4c), 
indicating decreased whole-body muscle breakdown. Collagen 
infiltration was decreased in the diaphragm of dko mice after treatment 
with BGP-15 (Fig. 4d) and the force-producing capacity of diaphragm 
muscle strips and intact TA muscles (measured in situ) was increased 
significantly in BGP-15-treated dko mice, with maximum force 
restored to WT levels in the TA muscle (Fig. 4e, f and Supplemen- 
tary Table 2). No differences in body mass or in calcification or central 
nucleation within diaphragm muscles were evident in dko mice after 
treatment with BGP-15 (data not shown). However, the most import- 
ant outcome was that lifelong treatment of dko mice with BGP-15 
significantly extended survival (Fig. 4g,h; P< 0.05; 27% increase in 
median lifespan). This finding has clinical relevance for DMD. 

Our findings reveal that transgenic Hsp72 overexpression improves 
several pathological indices in mdx dystrophic mice, at least in part by 
preserving or improving SERCA function. Furthermore, treatment of 
dystrophic mice with BGP-15, a known pharmacological co-inducer of 
Hsp72, ameliorated the dystrophic pathology and extended the 
lifespan in dko mice. Taken together, these results indicate that induc- 
tion of Hsp72 in muscular dystrophy is an important and novel 
therapeutic approach that can improve the dystrophic pathology 
and attenuate the disease progression. Although an ultimate cure for 
DMD is likely to be derived from gene or cell therapies, considerable 
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Figure 4 | Treatment with BGP-15 decreases kyphosis (spinal curvature), 
improves muscle function and prolongs lifespan in severely dystrophic dko 
mice. a, Left: representative photographs of WT (C57BL/10), dko and BGP-15- 
treated dko mice. Right: representative eviscerated skeletal preparations showing 
bone structure (pink), and highlighting spinal curvature in dko mice. b, Spinal 
curvature was quantified (1-5) in WT, dko and BGP-15-treated dko mice. 
Asterisk, P< 0.05, n = 10. c, Whole-body muscle breakdown, measured by 
serum CK levels. Asterisk, P< 0.05, n = 8. d, Representative images of collagen 
infiltration in diaphragm sections. e, Specific force of diaphragm muscle strips 
measured in vitro. Asterisk, P< 0.01, n= 9. f, Maximal force production in TA 
measured in situ. Asterisk, P< 0.01, n = 9. g, Survival curve of untreated 
(control) and BGP-15-treated dko mice. P< 0.05, n = 14. h, Scatter-plot of dko 
lifespan with a line showing median survival. Scale bar, 200 1m. WT data are 
used as a reference control. Data are shown as means ~ s.e.m. 


obstacles need to be overcome before these approaches can be con- 
sidered safe and effective. Until these concerns are obviated, alterna- 
tive (and potentially synergistic) therapies, such as pharmacological 
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induction of Hsp72, could delay the disease progression to allow many 
patients to benefit from perfected treatments. 

In a recent clinical trial for patients with insulin resistance, treat- 
ment with BGP-15 (200 mg and 400 mg once a day for 28 days) sig- 
nificantly increased sensitivity to insulin. The dose schedule of BGP-15 
was well tolerated, with both doses being safe: there were no clinically 
significant changes in physical status or in laboratory or electro- 
cardiogram parameters”. Given that BGP-15 is currently used in 
clinical trials for other pathologies, our findings identify it as a tangible 
and realistic treatment method for patients with DMD in the near 
future. 


METHODS SUMMARY 


Animals. All experiments were approved by the Animal Ethics Committee of The 
University of Melbourne and conducted in accordance with the Australian code of 
practice for the care and use of animals for scientific purposes as stipulated by the 
National Health and Medical Research Council (NHMRC, Australia). Male mice 
were used for all experiments. Wild-type (C57BL/10) and dystrophic mdx mice 
were sourced from the Animal Resources Centre (Canning Vale, Western 
Australia). 

Muscle functional analysis. Mice were anaesthetized with sodium pentobarbitone 
(Nembutal) such that they were unresponsive to tactile stimuli. Contractile 
properties of diaphragm muscle strips were assessed in vitro. 

Morphological analysis. Muscles were trimmed of tendons and adhering non- 
muscle tissue, mounted in embedding medium, frozen in liquid-nitrogen-cooled 
isopentane, and stored at —80 °C. Transverse muscle sections were cryosectioned 
from the mid-belly of each muscle. Muscle collagen content was assessed from 
Van Gieson’s stained cross-sections that were quantified. 

SERCA activity assay. Ca”*-dependent SERCA activity was assessed in isolated 
enriched SR vesicles and whole-muscle homogenates. For whole-muscle homo- 
genates, muscles were surgically excised from anaesthetized mice and stored at 
—80°C for subsequent analyses. For enriched SR vesicles, mixed hindlimb mus- 
cles (quadriceps, gastrocnemius, extensor digitorum longus, soleus and plantaris) 
and diaphragm muscles were homogenized and subjected to sucrose gradient 
differential centrifugation using a Thermo Scientific Sorvall WX100 ultracentri- 
fuge with a T-890 fixed-angle rotor. During the entire homogenization and SR 
vesicle isolation procedures, samples were immersed in ice to avoid temperature- 
dependent decreases in SERCA activity. 

Skeletal preparation. To reveal skeletal architecture, mice (after death) were 
skinned, eviscerated and placed in a KOH solution (1.5% w/v), for 5 days. KOH 
solution was replaced and a small amount of alizarin red was added to stain 
calcium deposits; the preparation was left for a further 5 days. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Animals. Female mdx mice were crossed with male mice expressing a rat inducible 
Hsp72 transgene under the control of a B-actin promoter*’. F, generation males 
were mated with female mdx mice to pad an equal proportion of mdx'@? and 
mdx'°™ littermate controls. WT'* and WT'°™ mice were generated as 
described previously’. Genotyping was performed by PCR using primers 
described previously’. C57BL/10 and mdx mice used for heat therapy and 
BGP-15 treatment studies were obtained from the Animal Resources Centre 
(Canning Vale, Western Australia). Dystrophic mdx mice were treated from 
4 weeks to 9 weeks of age (5 weeks) or from 4 weeks to 16 weeks of age (12 weeks) 
with BGP-15 (15mgkg ' per day, oral gavage; N-Gene R&D Inc.). BGP-15 is a 
pharmacological inducer of Hsp72, which has been shown to be safe and well 
tolerated in FDA-approved Phase II clinical trials for diabetes and insulin resist- 
ance~**, For heat therapy, the mice were anaesthetized and the core body tem- 
perature was raised to about 40 °C with an infrared thermometer for 30 min every 
fourth day; body temperature was monitored from the external auditory meatus as 
described previously**. The dko mice were bred in the Biological Research Facility 
at the University of Melbourne*. The dko mice are utrophin-null on an mdx 
background and show a severe and progressive muscular dystrophy that more 
closely mimics the phenotypic characteristics of DMD’*. The dko mice were 
treated with BGP-15 from 3-4 weeks to 10 weeks of age. For the survival study, 
dko mice were treated from 3-4 weeks until death or humane killing in accordance 
with the Animal Ethics Committee of The University of Melbourne (AEC). All 
experiments were approved by the AEC and conducted in accordance with the 
Australian code of practice for the care and use of animals for scientific purposes as 
stipulated by the National Health and Medical Research Council (Australia). Male 
mice were used for all experiments. 

Muscle functional analyses. Mice were anaesthetized with sodium pentobarbitone 
(Nembutal; 60 mg kg! intraperitoneal; Sigma-Aldrich, New South Wales, 
Australia) such that they were unresponsive to tactile stimuli. Contractile properties 
of diaphragm muscle strips were assessed in vitro as described in detail previously’. 
In brief, strips of diaphragm muscle were bathed in oxygenated Krebs solution at 
25 °C ina custom-made organ bath. Preparations were stimulated by supramaximal 
0.2-ms square-wave pulses of 450 ms duration delivered by means of platinum plate 
electrodes flanking both sides of the muscle. Contractile properties of the TA muscle 
were assessed in situ’. In brief, TA muscles maintained at 37 °C were stimulated by 
supramaximal (14 V) 0.2-ms square-wave pulses of 350 ms duration delivered by 
means of two wire electrodes resting on the peroneal nerve. All muscles were 
adjusted to optimum muscle length (L,), determined from maximum isometric 
twitch force (P,). Maximum isometric tetanic force (P,) was recorded from the 
plateau of the frequency-force relationship, and normalized to muscle cross- 
sectional area (specific force; sP,), for comparisons between groups where 
appropriate****. The susceptibility of TA and diaphragm muscles to contraction- 
induced injury was determined from the protocols that we have described in detail 
previously****. Isolated muscles were maximally activated to produce isometric 
force and then stretched to perform an eccentric contraction (at a velocity of 
2L;s- ') at progressively increasing magnitudes of stretch. Maximum isometric 
force was determined before each eccentric contraction”. 

Morphological analysis. Muscles were trimmed of tendons and adhering non- 
muscle tissue, mounted in embedding medium, frozen in liquid-nitrogen-cooled 
isopentane, and stored at —80°C. Transverse muscle sections (5m) were 
cryosectioned from the mid-belly of each muscle. Sections were stained with 
haematoxylin/eosin to reveal general muscle architecture. Cross-sectional area 
(CSA) and minimal Feret’s diameter was assessed in about 500 fibres from each 
section of diaphragm muscle, with the use of Carl Zeiss software (Axiovision 
4.6.3). The minimal Feret’s diameter is the minimum distance between opposing 
parallel tangents of a muscle fibre*’, and the variance coefficient is a highly 
sensitive parameter used for detecting differences between dystrophic and 
otherwise healthy muscles*. Muscle collagen content was assessed from Van 
Gieson’s stained cross-sections that were quantified with Axiovision 4.6.3 
software. 

Enriched SR isolation. Mixed hindlimb muscles (quadriceps, gastrocnemius, 
extensor digitorum longus, soleus and plantaris) and diaphragm muscles were 
diluted in ice-cold homogenization buffer (250 mM sucrose, 5 mM HEPES pH 7.0, 
0.2% NaNs). Protease inhibitor cocktail (P-8340; Sigma-Aldrich, Castle Hill, New 
South Wales, Australia) was added immediately before homogenization at 5 1 per 
100 mg muscle wet weight. The muscles were minced on ice with surgical scissors 
and homogenized with a Polytron homogenizer (PT2100; Kinematica, Inc. 
Dispersing and Mixing Technology, New York) at a power setting of 21 for three 
20-s bursts separated by 45-s breaks on ice. To obtain a purified and enriched SR 
membrane fraction, the homogenates were centrifuged at 5,500g for 10 min at 4 °C 
to remove insoluble material. The supernatants were harvested and centrifuged at 
12,500g for 18min at 4°C. The pellet was discarded and the supernatant was 


centrifuged for a second time at 12,500g for 18 min at 4°C. Supernatants were 
transferred to ultracentrifuge tubes, which were balanced and centrifuged at 
50,000g (24,200 r.p.m. on a T-890 fixed-angle rotor; Thermo Scientific Sorvall 
WX100 ultracentrifuge) for 1h at 4°C. Supernatants were discarded and pellets 
were resuspended in 5 ml of homogenization buffer containing 600 mM KCl and 
incubated on ice for 30 min. Samples were centrifuged at 15,000g for 10 min at 4 °C 
to pellet and remove mitochondria. Supernatants were centrifuged again at 
50,000g for 1h at 4°C. The final pellet (enriched SR membrane vesicles) was 
resuspended in homogenization buffer. Protein content was determined in 
triplicate*’. 

SERCA activity assay. Ca”*-dependent SERCA activity was assessed in isolated 
enriched SR vesicles and whole-muscle homogenates on the basis of the methods of 
Leberer and colleagues, as described previously*'. For whole-muscle homogenates, 
muscles were surgically excised from anaesthetized mice and stored at —80°C. 
Muscle samples (about 20-50 mg) were diluted in about 200 ul of ice-cold homo- 
genization buffer. Protease inhibitor cocktail (P-8340) was added immediately 
before use at a concentration of 5 yl per 100 mg of muscle tissue. Muscles were 
homogenized with a hand-held glass homogenizer, and then centrifuged for 10 min 
at 5,500g at 4 °C. Supernatant protein concentration was determined in triplicate*®. 
Protein concentration was adjusted to 10 mg ml — | when possible, with homogen- 
ization buffer. SERCA activity in whole-muscle homogenates was determined in 
reaction buffer (200mM KCI, 20mM HEPES, pH7.0, 15mM MgCl, 10mM 
NaN;, 10mM phosphoenolpyruvate, 5mM ATP, 1mM EGTA). SERCA activity 
in enriched SR vesicles was determined in reaction buffer (100 mM KCl, 20 mM 
HEPES, 10 mM MgCl, 10 mM NaN;, 10mM phosphoenolpyruvate, 5mM ATP, 
and 1mM EGTA). The pH of both reaction buffers was adjusted to 7.0 at 37°C. 
Immediately before starting the reaction, 18Uml! PK, 18Uml"? lactate 
dehydrogenase (LDH), 5u1 NADH (100mM), 1M calcimycin A-23187 
(Sigma-Aldrich) and about 10 pl of whole-muscle homogenate or about 3 pl of 
enriched SR vesicles were added to 1 ml of reaction buffer in a plastic cuvette. 
Cuvettes were loaded into a spectrophotometer and A349 was measured at 37 °C 
(Multiscan Spectrum; Thermo Electron, Waltham, Massachusetts). Maximal 
(Vmax) and Ca?*-dependent SERCA activities were determined by progressively 
adding Ca** until a plateau or maximal activity was reached. The specific SERCA 
inhibitor 2',5’-di(tert-butyl)-1,4-benzohydroquinone (TBQ) was added to a final 
concentration of 40M to determine basal activity. SERCA enzyme kinetic 
parameters, determined from a regression analysis, were the Ca?* concentra- 
tion required to elicit 50% maximal SERCA activity ([Ca**]s0) and the Hill coef- 
ficient, which is a measure of the cooperativity of Ca** binding of the SERCA 
enzyme. SERCA activity was graphed against added Ca** concentration. Non- 
regression analysis was performed using the following variable-slope sigmoidal 
dose-response relationship, using Graphpad Prism v. 3.02 (GraphPad Software 
Inc., San Diego): 
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Peroxynitrite-mediated SERCA inhibition assay. SERCA activity in enriched 
SR vesicles after incubation in various concentrations of peroxynitrite 
([ONOO ]) was measured. [ONOO] was measured spectrophotometrically 
(e = 1,670Mcm at 302 nm, about 68 mM in stock). Enriched SR vesicle samples 
were added to reaction buffer (as described previously) in the presence of ONOO ~ 
(0, 10, 50, 100, 250, 500 or 1,000 uM) and incubated for 5 min at 37 °C ina plastic 
cuvette. ONOO and byproducts were quenched by the addition of 1mM 
dithiothreitol (to prevent the inactivation of PK and LDH subsequently added); 
18 Uml | PK, 18 Uml! LDH and 5 ull of NADH (100 mM) were then added. 
Cuvettes were loaded into a spectrophotometer, and A349 was measured at 37 °C. 
Maximal SERCA activity was determined by adding Ca** until a plateau or 
maximal activity was reached. Once maximal activity had been determined, the 
specific SERCA inhibitor TBQ was added to a final concentration of 40 1M to 
determine basal activity. Maximal SERCA activity in each sample was assessed 
independently at various ONOO concentrations (10, 50, 100, 250, 500 or 
1,000 1M) and expressed as a percentage of the maximal SERCA activity in the 
absence of ONOO . 

Human DMD samples. Human muscle specimens were sourced from the 
Telethon Network of Genetic Biobanks, Italy. Biopsies were taken from the vastus 
lateralis muscle of patients with DMD (aged 1-9 years) or healthy controls (aged 
18-25 years) using either a Bergstrém or open biopsy technique. All biopsies were 
frozen immediately in liquid nitrogen and stored at —80 °C until analysis. Muscle 
samples were homogenized in RIPA buffer (Millipore, Billerica, Massachusetts) 
and rotated for 1h at 4°C, followed by centrifugation for 15 min at 4°C. The 
supernatant was collected and protein concentration was determined by means of 
the bicinchoninic acid protein assay (Pierce Biotechnology). Electrophoresis was 
performed by 10% SDS-PAGE in a buffer containing 12 mM Tris-HCl pH 8.8, 
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200 mM glycine and 0.1% SDS. Protein transfer was performed in a buffer contain- 
ing 12 mM Tris-HCl pH 8.3, 200 mM glycine and 10% methanol with the use of 
poly(vinylidene difluoride) membranes. The membranes were blocked with 5% 
BSA in PBS, after which they were incubated overnight at 4°C with the primary 
antibody against Hsp70 (SPA-812; Stressgen) diluted 1:1000 in PBS. After being 
washed, the membranes were incubated for 1h with a goat anti-rabbit IgG 
antibody labelled with an infrared-fluorescent 800-nm dye (Alexa Fluor 800; 
Invitrogen) diluted 1:5,000 in PBS containing 50% Odyssey blocking buffer (LI- 
COR Biosciences) and 0.01% SDS. After being washed, the proteins were exposed 
on an Odyssey Infrared Imaging System (LI-COR Biosciences) and individual 
protein band optical densities were determined with Image] Software (National 
Institutes of Health). The blots were normalized against glyceraldehyde-3- 
phosphate dehydrogenase (GAPDH) protein (G8795; Sigma-Aldrich, Sydney, 
Australia). 

Evans blue dye uptake. To quantify muscle damage and areas of focal necrosis, 
EBD (1% w/v) was injected intraperitoneally (10 pl per gram body mass). Muscles 
were frozen 20h after the EBD injection. Sections 101m thick were cut on a 
cryostat, and EBD was detected as red autofluorescence with the use of a fluor- 
escence microscope. 

Wire hang test. A wire hang test was employed to assess whole-body muscle 
strength and endurance. Mice were placed on a wire mesh grid, which they 
gripped; the grid was then inverted. Latency-to-fall onto a padded surface was 
recorded in three successive trials, with the average of the best two trials used for 
analyses. 

Single muscle fibre analysis. Mice were anaesthetized with sodium pentobarbitone 
(Nembutal) such that they were unresponsive to tactile stimuli, and the extensor 
digitorum longus and/or soleus muscle was surgically excised for single fibre ana- 
lysis. The muscle was blotted on filter paper and placed in a Petri dish containing 
paraffin oil at room temperature (23 + 2 °C). Muscles were pinned at resting length 
to the base of a dish. Single muscle fibres were isolated from as close to the surface of 
the muscle as possible, and fine forceps were used to peel the sarcolemma away from 
the contractile apparatus under a dissecting microscope. The mechanically skinned 
fibre was then attached to one end of a piezoresistive force transducer (AE801 
SensoNor; Horten) using braided silk (size 10, 0.2 mm; Deknatel), and the other 
end of the fibre was clamped between a pair of forceps fixed to a micromanipulator. 
All experiments were conducted at room temperature. All solutions had pH of 
7.10 + 0.01, and the free Mg** concentration ([Mg**]) was 1 mM, unless specified 
otherwise. Free [Ca**] at 0.1 4M or more was verified with a Ca’* -sensitive elec- 
trode (Orion Research). 

Caffeine-induced force responses and SR Ca** accumulation. Initially, mech- 
anically skinned muscle fibres were equilibrated for 30s in a wash solution fol- 
lowed by thorough depletion of SR Ca’ * stores, achieved by transferring the fibre 
preparation into a release solution containing 30 mM caffeine and 0.02 mM free 
Mg’". The presence of 0.5 mM EGTA in the release solution ensured that the level 
of Ca** during caffeine-induced release did not maximally activate the contractile 
apparatus, which is necessary to allow quantitative evaluation of the amount of 
Ca’* released. Ca”* release from the SR was estimated from the relative areas 
under the caffeine-induced force response. The fibre was left in the release solution 
for 2min to ensure complete SR Ca”* depletion, before being washed for 30s. 
Thereafter, the SR was reloaded with Ca** in load solution (0.2 1M Ca** (pCa 
6.7), where pCa = —log;9[Ca**]) for increasing durations (10, 20, 30 and 60s), 
before being equilibrated for 30s in wash solution; subsequently, SR Ca** was 
released in release solution. Data were fitted with a standard exponential asso- 
ciation equation to give the rate at which the SR accumulated Ca’* (in s”') but not 
the amount of SR Ca?* accumulated. SR accumulation is an indirect measure of 
SERCA activity”. 

SR Ca’* leak. The percentage of Ca*™ lost from the SR as a result of the passive 
leak was also assessed. The fibre was loaded for 20s in loading solution. The fibre 
preparation was then placed in wash solution for 30 s followed by SR Ca** content 
released in release solution (Ca”* leak in 30 s). The fibre preparation was placed in 
wash solution before reloading for 20s in load solution and transferred to wash 
solution for 90 s; the remaining SR Ca?* was released in release solution (Ca?* leak 
in 90s). The 30-s Ca’* leak was then repeated, and the area (corrected for pro- 
portionality between area and SR Ca”* content) under the test run was divided by 
the average of the areas under the caffeine-induced force responses in the controls 
before and after the test run. This gave an estimate of the fraction of SR Ca** 
leaked over a 60-s leak period. 

Relative SR Ca’* sensitivity. To determine the effect of Hsp72 overexpression on 
the ryanodine receptor (RyR), a caffeine dose-response curve was determined 
from the forces produced by the contractile apparatus after SR Ca” release 
induced by low caffeine concentrations. Each fibre was prepared by completely 
depleting the SR of Ca** with 30 mM caffeine followed by a 30-s SR Ca’* reload- 
ing duration. The peak force of caffeine-induced contraction was determined in a 
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series of potassium hexamethylene-diamine tetraacetic acid solutions containing 
2,3,5 and 7 mM caffeine and 50 tM EGTA, with complete SR Ca?* depletion and 
30-s reloading duration of the SR with Ca** between each caffeine contraction. 
The peaks of the caffeine-induced contractions were then normalized as a 
percentage of maximum Ca’*-activated force to estimate the sensitivity of the 
RyR to caffeine. This determined whether overexpression of Hsp72 directly 
affected the function of the RyR. 

Properties of the contractile apparatus. After SR properties were investigated, 
the single muscle fibres were equilibrated in a relaxing solution (pCa > 9) for 
2 min. Fibres were placed in a maximum Ca’*-activating solution (pCa ~ 4.5) 
until force reached the maximal value (maximum Ca’ * -activated force) and then 
placed back in the relaxing solution for a further 2 min. Force responses were 
generated by exposing the fibre to activating solutions of progressively lower 
pCa (higher [Ca**]) in a stepwise fashion. The force response generated at each 
pCa was expressed as a percentage of the interpolated values for maximum Ca** - 
activated force. Data points were fitted with a Hill equation producing two para- 
meters: the pCasy (the pCa producing half-maximum force) and my (the Hill 
coefficient, indicative of the steepness of the force-pCa relationship). 

Western blotting. After the determination of SERCA activity in whole-muscle 
homogenates and enriched SR vesicles, electrophoresis was performed on the 
same samples for the quantification of protein expression. Equal amounts of 
protein were resolved in SDS buffer, heated for 5 min at 95°C and separated on 
SDS-polyacrylamide gels. Separated proteins were transferred to poly(vinylidene 
difluoride) membranes (0.45-11m Immobilon-P; Millipore). Membranes were 
blocked with 5% non-fat milk powder (or BSA) in Tris-buffered saline containing 
Tween 20 for 1h and incubated overnight at 4°C with appropriate antibody 
dilutions. Antibodies against SERCA1 (Affinity Bioreagents), SERCA 2a 
(Affinity Bioreagents), Hsp72 (Stressgen), o-tubulin (ECM-Biosciences) and 
GAPDH (Sigma) were used. Membranes were digitized with a chemiluminescent 
detection and imaging system (ChemiDoc XRS; Bio-Rad Laboratories, Hercules, 
California, USA) and band densities were quantified with Quantity One analysis 
software (version 4.6.1; Bio-Rad Laboratories). 

Real-time PCR. Real-time PCR was performed as described previously. Each 
sample was run in triplicate. The mean reading of each triplicate was converted to 
an absolute content by using a DNA standard curve, based on a serial dilution of 
DNA standard (100-10,000 ng ml~'; Molecular Probes) run together with the 
samples on each plate. Gene expression was quantified by normalizing the 
logarithmic cycle threshold (C;) value (2¢,) to the cDNA content of each sample 
to obtain the expression 2¢,/cDNA content (ngml'). In Supplementary Fig. 5b, 
mRNA for SERCA 1 and SERCA 2a was normalized to the eukaryotic 18S house- 
keeping gene. 

Creatine kinase analysis. Serum CK activity was analysed as an overall measure of 
whole-body muscle breakdown. Blood was collected from the tail vein and cen- 
trifuged for 10 min at 10,000g and 4 °C to isolate the serum fraction. Serum CK 
activity was then determined with a commercially available creatine kinase assay 
kit (ECPK-100) in accordance with the manufacturer’s instructions (BioAssay 
Systems). 

Skeletal preparation. Skeletal architecture was revealed as described previously”. 
After death, the mice were skinned, eviscerated and placed in a KOH solution 
(1.5% w/v), for 5 days. KOH solution was replaced and a small amount of alizarin 
red was added to stain calcium deposits; the preparation was left for a further 
5 days to produce the specimens shown in Fig. 4a. 

Statistical analysis. All values are presented as means + s.e.m. Unpaired Student’s 
t-tests were used to compare between two groups. When comparisons were being 
made between more than two groups, a one-way analysis of variance was used with 
Newman-Keuls post-hoc multiple comparison test. 
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Inositol-1,4,5-trisphosphate receptor regulates 
hepatic gluconeogenesis in fasting and diabetes 


Yiguo Wang’, Gang Li’, Jason Goode’, Jose C. Paz', Kunfu Ouyang”, Robert Screaton***°, Wolfgang H. Fischer’, Ju Chen®, 
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In the fasted state, increases in circulating glucagon promote hepatic 
glucose production through induction of the gluconeogenic pro- 
gram. Triggering of the cyclic AMP pathway increases gluconeogenic 
gene expression via the de-phosphorylation of the CREB co-activator 
CRTC2 (ref. 1). Glucagon promotes CRTC2 dephosphorylation in 
part through the protein kinase A (PKA)-mediated inhibition of the 
CRTC2 kinase SIK2. A number of Ser/Thr phosphatases seem to be 
capable of dephosphorylating CRT C2 (refs 2, 3), but the mechanisms 
by which hormonal cues regulate these enzymes remain unclear. Here 
we show in mice that glucagon stimulates CRTC2 dephosphorylation 
in hepatocytes by mobilizing intracellular calcium stores and activ- 
ating the calcium/calmodulin-dependent Ser/Thr-phosphatase 
calcineurin (also known as PP3CA). Glucagon increased cytosolic 
calcium concentration through the PKA-mediated phosphoryla- 
tion of inositol-1,4,5-trisphosphate receptors (InsP3Rs), which 
associate with CRTC2. After their activation, InsP;Rs enhanced 
gluconeogenic gene expression by promoting the calcineurin- 
mediated dephosphorylation of CRTC2. During feeding, increases 
in insulin signalling reduced CRTC2 activity via the AKT-mediated 
inactivation of InsP3Rs. InsP3R activity was increased in diabetes, 
leading to upregulation of the gluconeogenic program. As hepatic 
downregulation of InsP3Rs and calcineurin improved circulating 
glucose levels in insulin resistance, these results demonstrate how 
interactions between cAMP and calcium pathways at the level of the 
InsP3R modulate hepatic glucose production under fasting condi- 
tions and in diabetes. 

We tested a series of Ser/Thr protein phosphatase inhibitors for 
their ability to block CRTC2 activation in response to glucagon. 
Exposure to the calcineurin inhibitor cyclosporine A (CsA) disrupted 
the glucagon-induced dephosphorylation and nuclear translocation of 
CRTC2, but okadaic acid, an inhibitor of PP1, PP2A and PP4 did not 
(Fig. la and Supplementary Fig. 1a). CsA and other calcineurin inhi- 
bitors also reduced cAMP response element (CRE)-luciferase (Luc) 
reporter activity (Fig. la and Supplementary Fig. 1b), but they had no 
effect in cells expressing phosphorylation-defective (Ser 171, 275 Ala) 
and therefore active forms of CRTC2 (Supplementary Fig. 1c-e). 

On the basis of the ability of CsA to interfere with CRTC2 activation, 
we considered that calcineurin may promote the dephosphorylation of 
CRTC2 in response to glucagon. Supporting this idea, CRTC2 contains 
two consensus (PXIXIT) motifs that mediate an association with calci- 
neurin** (Supplementary Fig. 2a, b). Moreover, mutation of both motifs 
disrupted the glucagon-dependent dephosphorylation of CRTC2 
(Fig. 1b) and prevented its nuclear translocation (Supplementary Fig. 
2c), thereby down-regulating CRE-Luc activation (Fig. 1b). 

On the basis of these results, we tested whether calcineurin mod- 
ulates expression of the gluconeogenic program. Adenoviral over- 
expression of the calcineurin catalytic subunit in hepatocytes augmented 
CRTC2 dephosphorylation, CRE-Luc activity, and glucose secretion in 


response to glucagon, whereas calcineurin knockdown had the opposite 
effect (Fig. 1c). Although calcineurin could, in principle, modulate 
CRTC2 activity indirectly through effects on cAMP signalling, calci- 
neurin overexpression or knockdown did not alter the phosphorylation 
of cellular PKA substrates in cells exposed to glucagon (Supplementary 
Fig. 2d). 

We examined next whether calcineurin modulates hepatic gluconeo- 
genesis in vivo. Modest (twofold) overexpression of calcineurin in liver 
increased gluconeogenic gene expression, hepatic CRE-Luc activity, 
and fasting blood glucose concentrations (Fig. 1d and Supplementary 
Fig. 3a). Conversely, knockdown of hepatic calcineurin reduced 
expression of the gluconeogenic program and lowered circulating 
glucose levels (Fig. 1d and Supplementary Fig. 3b), demonstrating that 
this phosphatase contributes to fasting adaptation in the liver. 
Calcineurin seemed to stimulate gluconeogenesis via the CREB 
pathway; depletion of CRTC2 blocked the effects of calcineurin over- 
expression (Supplementary Fig. 4). 

Realizing that calcineurin activity is dependent on increases in intra- 
cellular calcium, we tested whether the cAMP pathway stimulates 
calcium mobilization. Exposure of primary hepatocytes to glucagon 
triggered a rapid increase in cellular free calcium (Fig. 2a and 
Supplementary Fig. 5a); these effects were partially reversed by co- 
treatment with the PKA inhibitor H89 (Supplementary Fig. 5b). The 
rise in intracellular calcium seems to be critical for CRTC2 activation, 
because co-incubation with the calcium chelator BAPTA disrupted 
CRTC2 dephosphorylation and CRE-Luc activation in response to 
glucagon (Fig. 2b). Arguing against an effect of calcium on cAMP 
signalling, exposure to BAPTA did not block the PKA-mediated phos- 
phorylation of CREB in response to glucagon. 

We imagined that cAMP may increase calcium mobilization 
through the PKA-dependent phosphorylation of an intracellular 
calcium channel. In mass spectrometry studies to identify proteins 
that undergo phosphorylation by PKA in response to glucagon, we 
recovered the inositol 1,4,5-trisphosphate receptor 1 (InsP3R1) from 
immunoprecipitates of phospho-PKA substrate antiserum (Sup- 
plementary Fig. 5c). InsP3R1 and its related family members 
(InsP3R2, InsP3R3) are calcium release channels that promote the 
mobilization of endoplasmic reticulum calcium stores following their 
activation in response to extracellular signals*°. Moreover, cAMP 
agonists have also been shown to enhance InsP3R receptor activity 
through PKA-mediated phosphorylation. 

Inhibiting InsP3Rs, either by exposure of hepatocytes to xestospongin 
C or by knockdown of all three InsP3Rs, disrupted cytosolic calcium 
mobilization and calcineurin activation in response to glucagon and 
forskolin (Fig. 2a and Supplementary Fig. 6a). Moreover, xestospongin 
C treatment and InsP3R knockdown also blocked the effects of glucagon 
on CRTC2 dephosphorylation, CRE-Luc activation, and induction of 
the gluconeogenic program (Fig. 2c and Supplementary Fig. 6a, b). We 
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Figure 1 | Calcineurin promotes CRTC2 activation during fasting. a, Effect 
of Ser/Thr phosphatase inhibitors (okadaic acid (OA), CsA) on CRTC2 
dephosphorylation and CRE-Luc reporter activation (*P < 0.001; n = 3). Gcg, 
glucagon; Veh, vehicle. b, Effect of glucagon on dephosphorylation (left) and 
activity (right) of wild-type (WT) and calcineurin-defective (ACalna) CRTC2 
in hepatocytes (*P < 0.001; = 3). c, Effect of calcineurin A overexpression 
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(left) or knockdown (right) on CRTC2 dephosphorylation (top), CRE-Luc 
reporter activity (middle, *P < 0.001; n = 3), and glucose output (bottom, 

*P < 0.001; n = 3) from hepatocytes. US, unspecific. d, Effect of hepatic 
calcineurin overexpression (left) or knockdown (right) on CRE-Luc activity, 
gluconeogenic gene (Pck1, G6pc) expression, and blood glucose concentrations 
in 6-8h fasted mice (*P < 0.01; n = 5). Data are shown as mean = s.e.m. 


Figure 2 | Glucagon stimulates 
CRTC2 dephosphorylation via 
activation of InsP3Rs. a, Effect of 
glucagon (Gcg) on calcium 
mobilization in hepatocytes by 
fluorescence imaging. Calcium 
mobilization and calcineurin 
activation following knockdown of 
all three InsP3R family members 
shown (*P < 0.001; 1 = 3). b, Effect 
of calcium chelator (BAPTA) on 
CRTC2 dephosphorylation and 
CRE-Luc activation (*P < 0.001; 
n= 3). c Effect of InsP3R depletion 
on CRTC2 dephosphorylation, CRE- 
Luc activity, and glucose output from 
hepatocytes (*P < 0.001; n = 3). 

d, Effect of hepatic InsP3R 
knockdown on CRE-Luc activity, 
blood glucose, and gluconeogenic 
gene expression (*P < 0.01; n = 5). 
Data are shown as mean + s.e.m. 


Insp3r 


x 
oe yw 


confirmed the effects of InsP3R depletion using hepatocytes from mice 
with a knockout of the InsP3R2 (ref. 10), the predominant InsP3R 
isoform in these cells (Supplementary Fig. 6c-e). 

On the basis of these results, InsP3Rs would also be expected to 
modulate fasting glucose production in vivo. Decreasing hepatic 
InsP3R expression, either by knockdown of all three InsP3Rs in liver 
or by targeted disruption of the Insp3r2 gene, reduced fasting CRE-Luc 
activity, gluconeogenic gene expression, and circulating glucose 
concentrations, demonstrating the importance of these receptors in 
glucose homeostasis (Fig. 2d and Supplementary Fig. 7). 

We tested whether glucagon modulates InsP3R activity through 
PKA-mediated phosphorylation. Exposure of hepatocytes to glucagon 
increased the phosphorylation of InsP;R1 as well as InsP;R2 and 
InsP;R3 by immunoblot assay with phospho-PKA substrate antiserum; 
these effects were blocked by the PKA inhibitor H89 (Fig. 3a and 
Supplementary Fig. 8a). Moreover, mutation of serine residues at con- 
sensus PKA sites in InsP3R1 (Ser 1589, Ser 1756) to alanine completely 
disrupted InsP;R1 phosphorylation in response to glucagon (Fig. 3b). 
As a result, overexpression of PKA-defective ($1589,1756A) InsP3R1 
interfered with calcium mobilization and calcineurin activation, and it 
reduced CRE-Luc activation and glucose secretion from hepatocytes 
exposed to glucagon (Fig. 3b-d). 

Similar to glucagon, fasting also stimulated hepatic InsP3R1 phos- 
phorylation at Ser 1589 and Ser 1756 (Supplementary Fig. 8b). And 
overexpression of PKA-defective InsP3R1 reduced fasting CRE-Luc 
induction, calcineurin activation, and gluconeogenic gene expression, 
leading to lower circulating glucose concentrations (Fig. 3e and Sup- 
plementary Fig. 8c, d). Taken together, these results support an 
important role for the PKA-mediated phosphorylation of InsP3;R in 
hepatic gluconeogenesis. 

We considered that the proximity of CRTC2 to the calcium signal- 
ling machinery may be important for its activation. Supporting this 
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Figure 3 | Glucagon stimulates CRTC2 activity via PKA-dependent 
phosphorylation of InsP3Rs. a, b, Immunoblots of InsP3R1 
immunoprecipitates (IP) using phospho-PKA substrate antiserum to show 
effect of H89 (a) and Ala mutations (b) at one or both (DM) PKA consensus 
sites (Ser 1589, Ser 1756) on InsP3R1 phosphorylation in hepatocytes exposed 
to glucagon (Gcg). Effect of wild-type and PKA-mutant InsP;R1 on calcium 
mobilization in response to Gcg (b) shown (*P < 0.001; n = 3). ¢, d, Effect of 
wild-type or PKA-defective InsP3R1 (DM) on calcineurin (Calna) activation 
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notion, CRTC2 was found to associate with InsP3R1 via its amino- 
terminal CREB binding domain (CBD) in co-immunoprecipitation 
assays (Fig. 3f and Supplementary Fig. 9a—-d). Moreover, CRT'C2 was 
enriched in endoplasmic reticulum containing high density micro- 
somal fractions, which also contain the InsP3Rs (Supplementary 
Fig. 9e). The InsP3R-CRTC2 association seems to be critical for 
CRTC2 localization in the perinuclear space, because RNA interfer- 
ence (RNAi)-mediated knockdown of the InsP3Rs led to redistribution 
of CRTC2 in the cytoplasm (Supplementary Fig. 9f). Disrupting the 
CRTC2-InsP;R interaction, by deletion of the CBD in CRTC2 or by 
addition of an N-terminal myristoylation signal that targets CRTC2 to 
the plasma membrane, blocked CRTC2 dephosphorylation and CRE- 
reporter activation in response to glucagon (Supplementary Fig. 9g-i). 
Taken together, these results suggest that the association of CRTC2 
with InsP3Rs enhances its sensitivity to fasting signals. 

Under feeding conditions, insulin inhibits gluconeogenesis in part 
by increasing CRTC2 phosphorylation. We wondered whether insulin 
interferes with InsP3R effects on CRTC2 activitation. Supporting this 
idea, AKT has been shown to block calcium mobilization by phos- 
phorylating InsP3Rs at Ser 2682 (in InsP3R1)''. Indeed, exposure of 
hepatocytes to insulin increased InsP3R phosphorylation by immuno- 
blot analysis with phospho-AKT substrate antiserum (Supplementary 
Fig. 10a); mutation of Ser 2682 (in InsP3R1) to alanine blocked these 
effects. Insulin treatment also reduced glucagon-dependent increases 
in calcium mobilization and calcineurin activation in cells expressing 
wild-type InsP3R1, but it had no effect in cells expressing AKT-defective 
(S2682A) InsP3R1 (Supplementary Fig. 10b). As a result, CRTC2 
dephosphorylation, CRE-Luc activity, and glucose output were elevated 
in hepatocytes expressing InsP3R(S2682A) compared to wild type 
(Supplementary Fig. 10c). 

We examined whether InsP3;R1 phosphorylation by AKT is import- 
ant in regulating hepatic glucose production in vivo. In line with this 
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(c) and CRTC2 dephosphorylation (c), as well as CRE-Luc activation (d) and 
glucose output (d) from hepatocytes (*P < 0.001; n = 3). e, Effect of wild-type 
and PKA-defective InsP;R1 on hepatic CRE-Luc activity, fasting blood glucose, 
and gluconeogenic gene expression (G6pc, Pck1) (*P < 0.01 versus wild type; 
n= 5). f, Co-immunoprecipitation of CRTC2 with InsP;R1 in primary 
hepatocytes. Exposure to glucagon (100 nM, 15 min) indicated. Input levels of 
CRTC2 and InsP3R1 in nuclear (Nu) and post-nuclear (p/Nu) supernatant 
fractions shown. Data are shown as mean + s.e.m. 
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Figure 4 | InsP3R activity is upregulated in diabetes. a, Hepatic CRE-Luc 
and calcineurin activity in lean and db/db mice (*P < 0.001; n = 5). 

b, Immunoblots showing relative amounts and phosphorylation of InsP;R 
family members in livers of ad libitum fed lean, db/db, or ob/ob mice. InsP3R 


notion, feeding increased hepatic InsP;R1 phosphorylation at Ser 2682 
(Supplementary Fig. 8b). Moreover, overexpression of AKT-defective 
InsP3RI1 partially suppressed feeding-induced decreases in CRE-Luc 
activity and gluconeogenic gene expression, leading to elevations in 
circulating glucose concentrations (Supplementary Fig. 10d). Taken 
together, these results suggest that the AKT-mediated phosphoryla- 
tion of InsP;Rs during feeding inhibits hepatic gluconeogenesis by 
blocking the calcineurin-dependent dephosphorylation of CRTC2. 

We wondered whether hepatic InsP3R signalling contributes to 
increases in gluconeogenesis in the setting of insulin resistance. 
Supporting this notion, hepatic calcineurin activity was enhanced in 
both ob/ob and db/db diabetic animals, leading to increases in CRE- 
Luc activity (Fig. 4a and Supplementary Fig. 11a, b). Pointing to a role 
for InsP3R, hepatic amounts of PKA-phosphorylated, active InsP3Rs 
were increased in these diabetic mice, whereas amounts of AKT- 
phosphorylated, inactive InsP3Rs were reduced (Fig. 4b). Correspond- 
ingly, knockdown of either calcineurin or InsP3Rs in db/db mice 
reduced CRE-Luc activity, gluconeogenic gene expression, and hepatic 
gluconeogenesis (Fig. 4c and Supplementary Fig. 11c). 

Collectively, our results demonstrate that glucagon promotes 
CRTC2 dephosphorylation during fasting by triggering increases in 
cytoplasmic calcium that lead to calcineurin activation (Supplemen- 
tary Fig. 12). The ability for glucagon to increase calcium signalling via 
the PKA-mediated phosphorylation of InsP;Rs demonstrates an 
important regulatory node for cross-talk between cAMP and calcium 
signalling pathways in liver and perhaps other insulin sensitive tissues. 
The partial inhibition of calcium entry by the PKA inhibitor H89 also 
points to additional regulatory inputs'*”* that may function with PKA 
to increase InsP3R activity in response to glucagon. CRTC2 has also 
been found to stimulate metabolic gene expression by upregulating 
the nuclear hormone receptor co-activator PGC-1a in liver'*!’ and 
muscle’’. On the basis of the well-recognized role of calcium signalling 
in PGC-la-dependent transcription, InsP;3Rs may also have an 
important function in this setting. 
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phosphorylation at PKA or AKT sites indicated. c, Effect of RNAi-mediated 
depletion of InsP3Rs or calcineurin A on CRE-Luc activity, gluconeogenic gene 
expression, and hepatic glucose production in db/db mice, determined by 
pyruvate tolerance testing (*P < 0.01; n = 5). Data are shown as mean + s.e.m. 


METHODS SUMMARY 


Adenoviruses were delivered by tail vein injection'’. Hepatic CRE-Luc activity was 
visualized using an IVIS Imaging system. Mice were imaged 3-5 days after injection 
of CRE-Luc adenovirus. Pyruvate tolerance testing was performed on mice fasted 
overnight and injected intraperitoneally with pyruvate (2 gkg_'). Insp3r2 knockout 
mice have been described’®. Cultured primary mouse hepatocytes were prepared as 
reported’*. Cellular fractionation studies were conducted using primary mouse 
hepatocytes’®. Calcium imaging experiments were performed using a CCD camera 
on primary hepatocytes loaded with fura-2 dye. Mass spectrometry studies were 
performed on CRTC2 immunoprecipitates prepared from HEK293T cells and on 
immunoprecipitates of phospho-PKA substrate antiserum prepared from primary 
hepatocytes exposed to glucagon. Anti-InsP;R1 (A302-158A) and InsP3;R3 (A302- 
160A) antibodies were purchased from Bethyl Laboratories, anti-InsP3R2 (ab77838) 
antiserum was from Abcam, anti-calcineurin (610260) from BD Biosciences, anti- 
GRP78 (ADI-SPA-826-F) from Enzo Life Sciences, anti-phospho-PKA substrate 
(RRXS/T, 9624), anti-phospho-AKT substrate (RXXS/T, 9614) and CRTC2 (pS171, 
2892) from Cell Signaling. Phospho (Ser 275) CRTC2 antibody was used as 
described’. For more details, see Supplementary Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mouse strains and adenovirus. Adenoviruses (1 X 10° plaque forming units 
(p.f.u.) GFP, calcineurin, InsP3R1, InsP;R1 DM (S1589A/S1756A), unspecific 
RNAi, calcineurin RNAi, Insp3rl RNAi, Insp3r2 RNAi, Insp3r3 RNAi, Crtc2 
RNAi, 1 X 10” p.f.u. CRE-Luc reporter, 5 X 10’ p.fu. RSV B-galactosidase) were 
delivered to 8-10-week-old male C57BL/6J, B6.V-lep < ob>/J, B6.Cg-m+/+ Lepr 
< db>/J mice by tail vein injection’. Insp3r2 knockout mice were described 
previously’®. All mice were adapted to their environment for 1 week before study 
and were housed in colony cages with a 12h light/dark cycle in a temperature- 
controlled environment. For in vivo imaging experiments, mice were imaged on 
day 3-5 after adenovirus delivery. Wild-type CRTC2, CRTC2(S171A), GFP, 
unspecific RNAi, Crtc2 RNAi, CRE-Luc and RSV f-gal adenoviruses have been 
described previously’’”°. The adenoviruses containing rat InsP3R1, InsP3R1 DM 
and InsP3R1(S2682A) were generated from the InsP3R1 plasmid, provided by I. 
Bezprozvanny (UT Southwestern Medical Center at Dallas). Calcineurin 
adenovirus was constructed using a mouse calcineurin plasmid (Addgene). 
CRTC2 ACBD (51-692 amino acids), $275A and $171A/S275A adenoviruses 
were made from mouse CRTC2. Myristoylated CRTC2 (Myr-CRTC2) adenovirus 
was generated with mouse CRTC2 fused to an N-terminal myristoylation tag 
(MGSSKSKPKDPSQR) from Src. Calcineurin RNAi, Insp3rl RNAi, Insp3r2 
RNAi, Insp3r3 RNAi adenoviruses were constructed using the sequence 
5'-GGGTACCGCATGTACAGGAAAA-3’, 5’-GGGTACTGGAATAGCCTCT 
TCC-3’, 5’-GGGTAACAAGCACCACCATCCC-3’ and 5'-GGGCAAGCTGCA 
GGTGTTCCTG-3’, respectively. All expressed constucts used in this study were 
confirmed by sequencing. 

In vivo analysis. For in vivo imaging, mice were imaged as described’””° under ad 
libitum feeding conditions or after fasting for 6h. For pyruvate challenge experi- 
ments, mice were fasted overnight and injected intraperitoneally with pyruvate 
(2gkg_'). Blood glucose values were determined using a LifeScan automatic 
glucometer. For immunoblot, mouse tissues were sonicated, centrifuged and 
supernatants were reserved for protein determinations, and SDS-PAGE analysis. 
In vitro analysis. HEK293T (ATCC) cells were cultured in DMEM containing 
10% FBS (HyClone), 100 mg ml penicillin-streptomycin. Mouse primary hepa- 
tocytes were isolated and cultured as previously described". Cellular fractionation 
studies were conducted as previously reported'*. For reporter studies, Ad-CRE- 
Luc-infected hepatocytes (1 p.f.u. per cell) were exposed to glucagon (100 nM) for 
2 to ~4h. For CsA (10 UM), okadaic acid (100 nM), cell permeable calcineurin 
autoinhibitory peptide (10M), CN585 (100,1M), calyculin A (10 nM), 
xestospongin C (2 4M), H89 (30 tM) or BAPTA (50 UM) inhibition, hepatocytes 
were pre-treated with the inhibitors for 1h. Luciferase activities were normalized 
to -galactosidase activity from adenoviral-encoded RSV _f-galactosidase. 
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Calcineurin activity (test kit from Enzo Life Sciences) and cellular cAMP levels 
(test kit from Cayman Chemical Company) were measured according to the 
manufacturer’s instructions. 

Calcium imaging. Mouse primary hepatocytes were plated on glass coverslips and 
loaded with 5 uM Fura-2 acetoxymethyl ester (Molecular Probes) in the presence of 
0.025% (w/v) pluronic F127 (Sigma-Aldrich) in Media 199 (Mediatech) for 30 min. 
Coverslips were mounted on a laminar flow perfusion chamber (Warner 
Instruments) and perfused with Media 199 or a solution of 100 nM glucagon in 
Media 199. Images of Fura-2 loaded cells were collected with a cooled CCD camera 
while the excitation wavelength was alternated between 340 nm and 380 nm. The 
ratio of fluorescence intensity at the two excitation wavelengths was calculated after 
subtracting background fluorescence. [Ca**]i (cytosolic free calcium concentra- 
tion) was calculated using a Fura-2 calcium imaging calibration kit (Invitrogen). 
Images were collected and analysed using the MetaFluor software package 
(Universal Imaging Corp.). Graphs represent average responses from groups of 
30-40 individual cells from representative single experiments. Bar graphs represent 
average responses (fold over average baseline) from 150-200 cells per condition. All 
experiments were repeated at least three times with similar results. 

Immunoblot, immunoprecipitation and immunostaining. Immunoblot, 
immunoprecipitation and immunostaining assays were performed as described'*. 
CRTC2, pCREB (Ser 133), CREB, pAKT (Thr 308), AKT, tubulin, HA and Flag 
antibodies were previously described'*. The antibodies anti-InsP3R1 (A302-158A) 
and InsP3R3 (A302-160A) were purchased from Bethyl Laboratories, anti-InsP;R2 
(ab77838) from Abcam, anti-calcineurin (610260) from BD Biosciences, anti- 
GRP78 (ADI-SPA-826-F) from Enzo Life Sciences, anti-phospho-PKA substrate 
(RRXS/T, 9624), anti-phospho-AKT substrate (RXXS/T, 9614) and CRTC2 
(pS171, 2892) from Cell Signaling. CRTC2 (pS275) antibody was used as 
described’. 

Quantitative PCR. Total cellular RNAs from whole liver or from primary 
hepatocytes were extracted using the RNeasy kit (Qiagen) and used to generate 
cDNA with SuperScript II enzyme (Invitrogen). CDNA were analysed by quant- 
itative PCR as described”®. 

Mass spectrometry. Immunoprecipitates of endogenous CRTC2 from HEK293T 
cells and of phospho-PKA substrate antiserum from glucagon-stimulated 
hepatocytes were prepared for mass spectrometric studies as previously reported”’, 
and analysed by electrospray ionization tandem mass spectrometry on a Thermo 
LTQ Orbitrap instrument. 

Statistical analyses. All studies were performed on at least three independent 
occasions. Results are reported as mean + s.e.m. The comparison of different 
groups was carried out using two-tailed unpaired Student's t-test. Differences were 
considered statistically significant at P< 0.05. 
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Sporadic autism exomes reveal a highly 
interconnected protein network of de novo mutations 


Brian J. O’Roak', Laura Vives', Santhosh Giriraj an', Emre Karakoc!, Niklas Krumm!, Bradley P. Coe!, Roie Levy’, Arthur Ko!, CholiLee', 
Joshua D. Smith!, Emily H. Turner!, Ian B. Stanaway', Benjamin Vernot', Maika Malig!, Carl Baker', Beau Reilly”, Joshua M. Akey’, 
Elhanan Borenstein’**, Mark J. Rieder', Deborah A. Nickerson’, Raphael Bernier’, Jay Shendure! & Evan E. Eichler’? 


It is well established that autism spectrum disorders (ASD) have a 
strong genetic component; however, for at least 70% of cases, the 
underlying genetic cause is unknown’. Under the hypothesis that 
de novo mutations underlie a substantial fraction of the risk for 
developing ASD in families with no previous history of ASD or 
related phenotypes—so-called sporadic or simplex families**—we 
sequenced all coding regions of the genome (the exome) for 
parent-child trios exhibiting sporadic ASD, including 189 new 
trios and 20 that were previously reported*. Additionally, we also 
sequenced the exomes of 50 unaffected siblings corresponding to 
these new (n= 31) and previously reported trios (n= 19)*, for a 
total of 677 individual exomes from 209 families. Here we show 
that de novo point mutations are overwhelmingly paternal in 
origin (4:1 bias) and positively correlated with paternal age, con- 
sistent with the modest increased risk for children of older fathers 
to develop ASD*. Moreover, 39% (49 of 126) of the most severe or 
disruptive de novo mutations map to a highly interconnected 
B-catenin/chromatin remodelling protein network ranked signifi- 
cantly for autism candidate genes. In proband exomes, recurrent 
protein-altering mutations were observed in two genes: CHD8 and 
NTNGI1. Mutation screening of six candidate genes in 1,703 ASD 
probands identified additional de novo, protein-altering muta- 
tions in GRIN2B, LAMC3 and SCNI1A. Combined with copy 
number variant (CNV) data, these results indicate extreme locus 
heterogeneity but also provide a target for future discovery, 
diagnostics and therapeutics. 

We selected 189 autism trios from the Simons Simplex Collection 
(SSC)°, which included males significantly impaired with autism and 
intellectual disability (n = 47), a female sample set (n = 56) of which 
26 were cognitively impaired, and samples chosen at random from the 
remaining males in the collection (n = 86) (Supplementary Table 1 
and Supplementary Fig. 1). In general, we excluded samples known to 
carry large de novo CNVs’. Exome sequencing was performed as 
described previously’, but with an expanded target definition (see 
Methods). We achieved sufficient coverage for both parents and child 
to call genotypes for, on average, 29.5 megabases (Mb) of haploid 
exome coding sequence (Supplementary Table 1). In addition, we 
performed copy number analysis on 122 of these families, using a 
combination of the exome data, array comparative genomic hybrid- 
ization (CGH), and genotyping arrays, thereby providing a more com- 
prehensive view of rare variation. 

In the 189 new probands, we validated 248 de novo events, 225 single 
nucleotide variants (SNVs), 17 small insertions/deletions (indels), and 
six CNVs (Supplementary Table 2). These included 181 non- 
synonymous changes, of which 120 were classified as severe based 
on sequence conservation and/or biochemical properties (Methods 
and Supplementary Table 3). The observed point mutation rate in 
coding sequence was ~1.3 events per trio or 2.17 X 10 ® per base 


per generation, in close agreement with our previous observations", 
yet in general, higher than previous studies, indicating increased 
sensitivity (Supplementary Table 2 and Supplementary Table 4)’. 
We also observed complex classes of de novo mutation including: five 
cases of multiple mutations in close proximity; two events consistent 
with paternal germline mosaicism (that is, where both siblings con- 
tained a de novo event observed in neither parent); and nine events 
showing a weak minor allele profile consistent with somatic mosaicism 
(Supplementary Table 3 and Supplementary Figs 2 and 3). 

Of the severe de novo events, 28% (33 of 120) are predicted to 
truncate the protein. The distribution of synonymous, missense and 
nonsense changes corresponds well with a random mutation model’ 
(Supplementary Fig. 4 and Supplementary Table 2). However, the 
difference in nonsense rates between de novo and rare singleton events 
(not present in 1,779 other exomes) is striking (4:1) and suggests 
strong selection against new nonsense events (Fisher’s exact test, 
P<0.0001). In contrast with a recent report®, we find no significant 
difference in mutation rate between affected and unaffected indivi- 
duals; however, we do observe a trend towards increased non- 
synonymous rates in probands, consistent with the findings of ref. 9 
(Supplementary Tables 1 and 2). 

Given the association of ASD with increased paternal age’ and our 
previous observations’, we used molecular cloning, read-pair informa- 
tion, and obligate carrier status to identify informative markers linked 
to 51 de novo events and observed a marked paternal bias (41:10; 
binomial P< 1.4 X 10°; Fig. 1a and Supplementary Tables 3 and 5). 
This provides strong direct evidence that the germline mutation rate in 
protein-coding regions is, on average, substantially higher in males. A 
similar finding was recently reported for de novo CNVs"°. In addition, 
we observe that the number of de novo events is positively correlated 
with increasing paternal age (Spearman’s rank correlation = 0.19; 
P<0.008; Fig. 1b). Together, these observations are consistent with 
the hypothesis that the modest increased risk for children of older 
fathers to develop ASD* is the result of an increased mutation rate. 

Using sequence read-depth methods in 122 of the 189 families, we 
scanned ASD probands for either de novo CNVs or rare (<1% of 
controls), inherited CNVs. Individual events were validated by either 
array CGH or genotyping array (see Methods). We identified 76 events 
in 53 individuals, including six de novo (median size 467 kilobases 
(kb)) and 70 inherited (median size 155kb) CNVs (Supplementary 
Table 6). These include disruptions of EHMT1 (Kleefstra’s syndrome, 
Online Mendelian Inheritance in Man (OMIM) accession 610253), 
CNTNAP4 (reported in children with developmental delay and aut- 
ism'') and the 16p11.2 duplication (OMIM 611913) associated with 
developmental delay, bipolar disorder and schizophrenia. 

We performed a multivariate analysis on non-verbal IQ (NVIQ), 
verbal IQ (VIQ) and the load of ‘extreme’ de novo mutations—where 
extreme is defined as point mutations that truncate proteins, intersect 
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Figure 1 | De novo mutation events in autism spectrum disorder. 

a, Haplotype phasing using informative markers shows a strong parent-of- 
origin bias with 41 of 51 de novo events occurring on the paternally inherited 
haplotype. Arrows represent sequence reads from paternal (blue) or maternal 
(red) haplotypes. b, c, Box and whisker plots for 189 SSC probands. b, The 
paternal estimated age at conception versus the number of observed de novo 
point mutations (0, n = 53; 1,n = 65; 2,n = 44; 3+, n = 27). c, Decreased non- 
verbal IQ is significantly associated with an increasing number of extreme 


Mendelian or ASD loci (n = 57), or de novo CNVs that intersect genes 
(n = 5) (Fig. 1c and Supplementary Discussion). NVIQ, but not VIQ, 
decreased significantly (P< 0.01) with increased number of events. 
Covariant analysis of the samples with CNV data showed that this 
finding was strengthened, but not exclusively driven, by the presence 
of either de novo or rare CNVs (Supplementary Fig. 5). 

Among the de novo events, we identified 62 top ASD risk con- 
tributing mutations based on the deleteriousness of the mutations, 
functional evidence, or previous studies (Table 1). Probands with these 
mutations spanned the range of IQ scores, with only a modest non- 
significant trend towards individual’s co-morbid with intellectual 
disability (Supplementary Figs 1 and 6). We observed recurrent, 
protein-disruptive mutations in two genes: NINGI (netrin G1) and 
CHD8 (chromodomain helicase DNA binding protein 8). Given their 
locus-specific mutation rates, the probability of identifying two inde- 
pendent mutations in our sample set is low (uncorrected, NTNGI: 
P<12xX10 ° CHD8 P<69X10 °) (Supplementary Fig. 7, 
Supplementary Table 8 and Methods). NTNG1 is a strong biological 
candidate given its role in laminar organization of dendrites and axonal 
guidance’’ and was also reported as being disrupted by a de novo trans- 
location in a child with Rett’s syndrome, without MECP2 mutation’’. 
Both de novo mutations identified here are missense (p.Tyr23Cys and 
p-Thr135lle) at highly conserved positions predicted to disrupt protein 
function, although there is evidence of mosaicism for the former muta- 
tion (Supplementary Table 3). 

CHDS has not previously been associated with ASD and codes for 
an ATP-dependent chromatin-remodelling factor that has a signifi- 
cant role in the regulation of both B-catenin and p53 signalling’*"*. We 
also identified de novo missense variants in CHD3 as well as CHD7 
(CHARGE syndrome, OMIM 214800), a known binding partner of 
CHD8 (ref. 16). ASD has been found in as many as two-thirds of 
children with CHARGE, indicating that CHD7 may contribute to an 
ASD syndromic subtype”. 

We identified 30 protein-altering de novo events intersecting with 
Mendelian disease loci (Supplementary Table 3) as well as inherited 
hemizygous mutations of clinical significance (Supplementary Table 9). 
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mutation events (0, n = 138; 1, n = 41; 2+, n = 10), both with and without 
CNVs (Supplementary Discussion). d, Browser images showing CNVs 
identified in the del(18)(q12.2q21.1) syndrome region. The truncating point 
mutation in SETBP1 occurs within the critical region, identifying the likely 
causative locus. Each red (deletion) and green (duplication) line represents an 
identified CNV in cases (solid lines) versus controls (dashed lines), with 
arrowheads showing point mutation. 


The de novo mutations included truncating events in syndromic 
intellectual disability genes (MBD5 (mental retardation, autosomal 
dominant 1, OMIM 156200), RPS6KA3 (Coffin-Lowry syndrome, 
OMIM 303600) and DYRKIA (the Down’s syndrome candidate 
gene, OMIM 600855)), and missense variants in loci associated with 
syndromic ASD, including CHD7, PTEN (macrocephaly/autism 
syndrome, OMIM 605309) and TSC2 (tuberous sclerosis complex, 
OMIM 613254). Notably, DYRKIA is a highly conserved gene 
mapping to the Down’s syndrome critical region (Supplementary 
Fig. 8). The proband here (13890) is severely cognitively impaired 
and microcephalic, consistent with previous studies of DYRK1A 
haploinsufficiency in both patients and mouse models”*. 
Twenty-one of the non-synonymous de novo mutations map to 
CNV regions recurrently identified in children with developmental 
delay and ASD (Supplementary Table 10), such as MBD5 (2q23.1 dele- 
tion syndrome), SYNRG (17q12 deletion syndrome) and POLRMT 
(19p13.3 deletion)’’. There is also considerable overlap with genes dis- 
rupted by single de novo CNVs in children with ASD (for example, 
NLGNI1 and ARIDIB; Supplementary Table 11). Given the prior 
probability that these loci underlie genomic disorders, the disruptive 
de novo SNVs and small indels may be pinpointing the possible major 
effect locus for ASD-related features. For example, we identified a com- 
plex de novo mutation resulting in truncation of SETBPI (SET binding 
protein 1), one of five genes in the critical region for del(18)(q12.2q21.1) 
syndrome (Fig. 1d), which is characterized by hypotonia, expressive 
language delay, short stature and behavioural problems”. Recurrent 
de novo missense mutations at SETBP1 were recently reported to be 
causative for a distinct phenotype, Schinzel-Giedion syndrome, 
probably through a gain-of-function mechanism”, indicating diverse 
phenotypic outcomes at this locus depending on mutation mechanism. 
Several of the mutated genes encode proteins that directly interact, 
suggesting a common biological pathway. From our full list of genes 
carrying truncating or severe missense mutations (126 events from all 
209 families), we generated a protein-protein interaction (PPI) net- 
work based on a database of physical interactions (Supplementary 
Table 12)”. We found 39% (49 of 126) of the genes mapped to a highly 
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Table 1 | Top de novo ASD risk contributing mutations 


Proband NVIQ Candidate gene Amino acid change 
2225.p1 89 ABCA2 p.Vall1845Met 
1653.p1 44 ADCY5 p.Arg603Cys 
2130.p1 55 ADNP. Frameshift indel 
1224.p1 112 AP3B2 p.Arg435His 
3447.p1 51 ARID1B Frameshift indel 
3415.p1 48 BRSK2 3n indel 
4292.p1 49 BRWD1 Frameshift indel 
1872.p1 65 CACNA1D p.Ala769Gly 
1773.p1 50 CACNAIE p.Gly1209Ser 
3606.p1 60 CDC42BPB p.Arg764TERM 
2086.p1 108 CDH5 p.Arg545Trp 
12630.p1 115 CHD3 p.Arg1818Trp 
13733.p1 68 CHD7 p.Gly996Ser 
13844.p1 34 CHD8 p.GIn959TERM 
12752.p1 93 CHD8 Frameshift indel 
13415.p1 48 CNOT4 p.Asp48Asn 
12703.p1 58 CTNNB1 p.Thr551Met 
11452.p1 80 CUL3 p.Glu246TER 
11571.p1 94 CULS p.Val355lle 
13890.p1 42 DYRK1A Splice site 
12741.p1 87 EHD2 p.Argl67Cys 
11629.p1 67 FBXO10 p.Glu54Lys 
13629.p1 63 GPS1 p.Arg492GlIn 
13757.p1 91 GRINL1A 3n indel 
11184.p1 94 HDGFRP2 p.Glu83Lys 
11610.p1 138 HDLBP p.Ala639Ser 
11872.p1 65 KATNAL2 Splice site 
12346.p1 77 MBD5 Frameshift indel 
11947.p1 33 MDM2 p.Glu433Lys/p.Trp160TERM 
11148.p1 82 MLL3 p.Tyr4691TERM 
12157.p1 91 NLGN1 p.His795Tyr 
11193.p1 138 NOTCH3 p.Gly1134Arg 
11172.p1 60 NR4A2 p.Tyr275His 
11660.p1 60 NTNG1 p.Thr135lle 
12532.p 110 NTNG1 p.Tyr23Cys 
11093.p 91 OPRL1 p.Arg157Cys 
13793.p 56 PCDHB4 p.Asp555His 
11707.p 23 PDCD1 Frameshift indel 
12304.p 83 PSEN1 p.Thr42 1lle 
11390.p 77 PTEN p.Thr167Asn 
13629.p 63 PTPRK p.Arg784His 
13333.p 69 RGMA p.Val379lle 
13222.p 86 RPS6KA3 p.Ser369TERM 
11257.p 128 RUVBL1 p.Leu365GIn 
11843.p 113 SESN2 p.Ala46Thr 
12933.p 41 SETBP1 Frameshift indel 
12565.p 79 SETD2 Frameshift indel 
12335.p 47 TBL1XR1 p.Leu282Pro 
11480.p 41 TBR1 Frameshift indel 
11569.p 67 TNKS p.Arg568Thr 
12621.p 120 TSC2 p.Arg1580Trp 
11291.p 83 TSPAN17 p.Ser75TERM 
11006.p 125 UBE3C p.Ser845Phe 
12161.p 95 UBR3 Frameshift indel 
12521.p 78 USP15 Frameshift indel 
11526.p 92 ZBTB41 p.Tyr886His 
13335.p 25 ZNF420 p.Leu76Pro 
CNV 
Proband NVIQ Candidate gene Type 
11928.p1 66 CHRNA7 Duplication 
13815.p1 56 CNTNAP4 Deletion 
13726.p1 59 CTNND1 Deletion 
12581.p1 34 EHMT1 Deletion 
13335.p1 25 TBX6 Duplication 


Top candidate mutations based on severity and/or supporting evidence from the literature. 


interconnected network wherein 92% of gene pairs in the connected 
component are linked by paths of three or fewer edges (Fig. 2a). We 
tested this degree of interconnectivity by simulation (n = 10,000 repli- 
cates; Methods and Supplementary Fig. 9) and found that our experi- 
mental network had significantly more edges (P< 0.0001) and a 
greater clustering coefficient (P < 0.0001) than expected by chance. 
To investigate the relevance of this network to autism further, we 
applied degree-aware disease gene prioritization (DADA)”, based on 
the same PPI database to rank all genes based on their relatedness to a 
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set of 103 previously identified ASD genes'’. We found that the genes 
with severe mutations ranked significantly higher than all other genes 
(Mann-Whitney U-test, P< 4.0 X 10~*), suggesting enrichment of 
ASD candidates. Furthermore, the 49 members of the connected com- 
ponent overwhelmingly drove this difference (Mann-Whitney U-test, 
P<16X 10 °), as the unconnected members were not significant on 
their own (Mann-Whitney U-test, P< 0.28), increasing our confid- 
ence that these connected gene products are probably related to ASD 
(Supplementary Fig. 10). Consistent with this finding, the rankings of 
unaffected sibling events are highly similar to the unconnected com- 
ponent, strengthening our confidence in the enrichment of the con- 
nected component of proband events for ASD-relevant genes. 

Members of this network have known functions in B-catenin and 
p53 signalling, chromatin remodelling, ubiquitination and neuronal 
development (Fig. 2a). A fundamental developmental regulator 
observed in the network is CTNNB1 (catenin (cadherin-associated 
protein), B1, 88 kDa), also known as -catenin. Interestingly, a parallel 
analysis using ingenuity pathway analysis (IPA) shows an enrichment 
of upstream interacting genes of the B-catenin pathway (8 of 358, 
P= 0.0030; see Methods, Supplementary Table 13 and Supplemen- 
tary Fig. 11). A role for Wnt/B-catenin signalling in ASD was previ- 
ously proposed”, largely on the basis of the association of common 
variants in EN2 and WNTZ2, and the high rate of children with 
macrocephaly. It is striking that both individuals with CHD8 muta- 
tions in this study have multiple de novo disruptive missense muta- 
tions in this pathway or closely related pathways (Fig. 2b, c and 
Supplementary Fig. 12) and both have macrocephaly. 

In addition, the pathway analysis shows several other disrupted genes 
not identified in the PPI that are involved in common pathways, which 
in some cases are linked to f-catenin (Supplementary Discussion and 
Supplementary Fig. 11). TBR1, for example, is a transcription factor that 
has a critical role in the development of the cerebral cortex**. TBR1 
binds with CASK and regulates several candidate genes for ASD and 
intellectual disability including GRIN2B, AUTS2 and RELN—genes of 
recurrent ASD mutation, some of which are described here and in 
other studies*?"""”, 

Our exome analysis of de novo coding mutations in 209 autism trios 
identified only two recurrently altered genes, consistent with extreme 
locus heterogeneity underlying ASD. This extreme heterogeneity 
necessitates the analysis of very large cohorts for validation. We imple- 
mented a cost-effective approach based on molecular inversion probe 
(MIP) technology” for the targeted resequencing of six candidate 
genes in ~2,500 individuals, including 1,703 simplex ASD probands 
and 744 controls. Four of these candidates (FOXP1, GRIN2B, LAMC3 
and SCN1A) were identified previously*, whereas two (FOXP2, OMIM 
602081 and GRIN2A, OMIM 613971) are related genes implicated in 
other neurodevelopmental phenotypes. We identified all previously 
observed de novo events (that is, in the same individuals), as well as 
additional de novo events in GRIN2B (two protein-truncating events), 
SCN1A (a missense) and LAMC3 (a missense) (Supplementary Table 8). 
The observed number of de novo events was compared with expecta- 
tions based on the mutation rates estimated for each gene (Methods 
and Supplementary Table 8), with GRIN2B showing the highest sig- 
nificance (uncorrected P value <0.0002). Notably, the three de novo 
events observed in GRIN2B are all predicted to be protein truncating, 
whereas no events truncating GRIN2B were found in more than 3,000 
controls (Methods). 

Our analysis predicts extreme locus heterogeneity underlying the 
genetic aetiology of autism. Under a strict sporadic disorder-de novo 
mutation model, if 20-30% of our de novo point mutations are con- 
sidered to be pathogenic, we can estimate between 384 and 821 loci 
(Methods and Supplementary Fig. 13). We reach a similar estimate if 
we consider recurrences from ref. 9. It is clear from phenotype and 
genotype data that there are many ‘autisms’ represented under the 
current umbrella of ASD and other genetic models are more likely 
in different contexts (for example, families with multiple affected 
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Figure 2 | Mutations identified in protein-protein interaction (PPI) 
networks. a, The 49-gene connected component of the PPI network formed 
from 126 genes with severe de novo mutations among the 209 probands. 

b, Proband 13844 inherits three rare gene-disruptive CNVs and carries two de 


individuals). There is marked convergence on genes previously impli- 
cated in intellectual disability and developmental delay. As has been 
noted for CNVs, this indicates that nosological divisions may not 
readily translate into differences at the molecular level. We believe that 
there is value in comparing mutation patterns in children with 
developmental delay (without features of autism) to those in children 
with ASD. 

Although there is no one major genetic lesion responsible for ASD, 
it is still largely unknown whether there are subsets of individuals with 
a common or strongly related molecular aetiology and how large these 
subsets are likely to be. Using gene expression, protein-protein inter- 
actions, and CNV pathway analysis, recent reports have highlighted 
the role of synapse formation and maintenance”. We find it 
intriguing that 49 proteins found to be mutated here have critical roles 
in fundamental developmental pathways, including B-catenin and p53 
signalling, and that patients have been identified with multiple 
disruptive de novo mutations in interconnected pathways. The latter 
observations are consistent with an oligogenic model of autism where 
both de novo and extremely rare inherited SNV and CNV mutations 
contribute in conjunction to the overall genetic risk. Recent work has 
supported a role for these interconnected pathways in neuronal stem- 
cell fate-determination, differentiation and synaptic formation in 
humans and animal models*****'. Given that fundamental develop- 
mental processes have previously been found to underlie syndromic 
forms of autism, a wider role of these pathways in idiopathic ASD 
would not be entirely surprising and would help explain the extreme 
genetic heterogeneity observed in this study. 


METHODS SUMMARY 

Exome capture, alignments and base-calling. Genomic DNA was derived 
directly from whole blood. Exomes were considered to be completed when 
~90% of the capture target exceeded 8-fold coverage and ~80% exceeded 20-fold 
coverage. Exomes for the 189 trios (and 31 unaffected siblings) were captured with 
NimbleGen EZ Exome V2.0. Reads were mapped as in ref. 4 to a custom reference 
genome assembly (GRC build37). Genotypes were generated with GATK unified 
genotyper and parallel SAMtools pipeline’. Exomes for the unaffected siblings 
matching the pilot trios were captured and analysed as in ref. 4. Predicted de novo 
events were called as in ref. 4 and confirmed by capillary sequencing in all family 
members (for 176 of the 189 trios, this also included one unaffected sibling). 
Mutations were considered severe if they were truncating, missense with 
Grantham score =50 and GERP score =3 or only Grantham score =85, or deleted 
a highly conserved amino acid. 

Exome read-depth CNV analysis. Reads were mapped using mrsFAST and 
normalized reads per kilobase of exon per million mapped reads (RPKM) values 
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novo truncating mutations. c, GeneMANIA” view of three of the affected genes 
(b) (red labels) which encode proteins that are part of a B-catenin-linked 
network. This proband is macrocephalic, impaired cognitively, and has deficits 
in social behaviour and language development (Supplementary Discussion). 


calculated by exon. Population normalization was performed using a set of 366 
non-ASD exomes. Calls were made if three or more exons passed a threshold value 
and cross-validated calls using two orthogonal platforms, custom array CGH and 
Illumina 1M array data”. CNVs were filtered to identify de novo and rare inherited 
events by comparison with 2,090 controls and 1,651 parent profiles. 

Network reconstruction and null model estimation. PPI networks were generated 
using physical interaction data from GeneMANIA”. Null models were estimated 
using gene-specific mutation rate estimates based on human-chimp divergence. To 
rank candidate genes we obtained the seed ASD list from ref. 17 and severe dis- 
ruptive de novo events from all families (n = 209). Given the PPI network and seed 
gene product list, we used DADA” for ranking each gene. 

Human subjects. All samples and phenotypic data were collected under the 
direction of the Simons Simplex Collection by its 12 research clinic sites (http:// 
sfari.org/sfari-initiatives/simons-simplex-collection). Parents consented and children 
assented as required by each local institutional review board. Participants 
were de-identified before distribution. Research was approved by the University 
of Washington Human Subject Division under non-identifiable biological 
specimens/data. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Exome capture, alignments and base-calling. Exomes for the 189 trios (and 31 
unaffected siblings) were captured with NimbleGen EZ Exome V2.0. Final 
libraries were then sequenced on either an Illumina GAIIx (paired- or single- 
end 76-bp reads) or HiSeq2000 (paired- or single-end 50-bp reads). Reads were 
mapped to a custom GRCh37/hg19 build using BWA 0.5.6 (ref. 32). Read qualities 
were recalibrated using GATK Table Recalibration 1.0.2905 (ref. 33). Picard-tools 
1.14 was used to flag duplicate reads (http://picard.sourceforge.net/). GATK 
IndelRealigner 1.0.2905 was used to realign reads around insertion/deletion 
(indel) sites. Genotypes were generated with GATK Unified Genotyper® with 
FILTER = “QUAL <= 50.0 || AB = 0.75 || HRun > 3 || QD < 5.0” and in parallel 
with the SAMtools pipeline as described previously*. Only positions with at least 
eightfold coverage were considered. All pilot sibling exomes were captured and 
analysed as described previously*. Predicted de novo events were called and com- 
pared against a set of 946 other exomes to remove recurrent artefacts and likely 
undercalled sites. Indels were also called with the GATK Unified Genotyper and 
SAMtools and filtered to those with at least 25% of reads showing a variant at a 
minimum depth of 8X. Mutations were phased using molecular cloning of PCR 
fragments, read-pair information, linked informative SNPs, and obligate carrier 
status. To identify rare private variants (singleton), the full variant list was com- 
pared against a larger set of 1,779 other exomes. Predicted de novo indels were also 
filtered against this larger set. 

Sanger validations. All reported de novo events (exome or MIP capture) were 
validated by designing primers with BatchPrimer3 followed by PCR amplification 
and Sanger sequencing. We performed PCR reactions using 10 ng of DNA from 
father, mother, unaffected sibling (when available), and proband and performed 
Sanger capillary sequencing of the PCR product using forward and reverse primers. 
In some cases, one direction could not be assessed due to the presence of repeat 
elements or indels in close proximity to the mutation event. 

Mutation candidate gene analysis. We examined whether each non-synonymous 
or CNV de novo event may be contributing to the aetiology of ASD by evaluating 
the likelihood deleteriousness of the change (GERP, Grantham score) and 
intersecting with known syndromic and non-syndromic candidate genes, CNV 
morbidity maps, and information in OMIM and PubMed. Mutations were con- 
sidered severe if they were truncating, missense with Grantham score =50 and 
GERP score =3 or only Grantham score 385, or deleted a highly conserved amino 
acid. For genes that had not previously been implicated in ASD, we gave priority to 
those with structural similarities to known candidate or strong evidence of neural 
function or development. 

Exome read-depth CNV discovery. To find CNVs using exome read-depth data, 
we first mapped sequenced reads to the hg19 exome using the mrsFAST aligner™. 
Next, we applied a novel method (N.K. et al., manuscript in preparation), which 
uses normalized RPKM values*’ of the ~194,000 captured exons/sequences, 
subsequent population normalization using 366 exomes from the Exome 
Sequencing Project and singular value decomposition to remove systematic bias 
present within exome capture reactions. Rare CNVs were detected using a 
threshold cutoff of the normalized RPKM values, and we required at least three 
exons above our threshold in order to make a call. We made a total of 1,077 
deletion or duplication calls in 366 individuals (range 0-14, median = 3, 
mean = 2.94). 

CNV detection using array CGH. A custom-targeted 2 x 400K Agilent chip with 
median probe spacing of 500 bp in the genomic hotspots flanked by segmental 
duplications or Alu repeats and probe spacing of 14kb in the genomic backbone 
was designed. All experiments were performed according to the manufacturer’s 
instructions using NA12878 as the female reference and NA18507 as the male 
reference (Coriell). Data analysis was performed following feature extraction using 
DNA analytics with ADM-2 setting. All CNV calls were visually inspected in the 
UCSC Genome Browser. CNV calls from probands were then intersected with 
those from parents and also with 377 controls recruited through NIMH Genetics 
Initiative**”’ and ClinSeq cohort** analysed on the same microarray platform. The 
NIMH set of controls were ascertained by the NIMH Genetics Initiative** through 
an online self-report based on the Composite International Diagnostic Instrument 
Short-Form (CIDI-SF)*”. Those who did not meet DSM-IV criteria for major 
depression, denied a history of bipolar disorder or psychosis, and reported exclu- 
sively European origins were included**”°. Samples from the ClinSeq cohort were 
selected from a population representing a spectrum of atherosclerotic heart 
disease**. De novo and inherited potential pathogenic CNVs were selected only 
if they intersected with RefSeq coding sequence and allowing for a frequency of 
<1% in the controls and <50% segmental duplication content. 

Illumina array CNV calling. CNV calling was performed in hg18 as described 
previously*', using an HMM that incorporates both allele frequencies (BAF) and 
total intensity values (logR). In total, we generated CNV calls for 841 probands, 
1,651 parents and 793 siblings including the samples reported recently*. Of the 122 


families selected for CNV comparisons in this study, calls were generated for 107 
probands. Of these, both parents were profiled for 101 families and one parent was 
profiled for the remaining six families. In addition, at least one sibling was profiled 
for 99 of these families. 

Independent of array CGH detection, to identify putatively pathogenic CNVs, 
we first compared our data to 2,090 control samples derived from the Wellcome 
Trust Case Control Consortium (WTCCC) National Blood Services Cohort'”” 
and filtered all CNVs present in 1% (20) of WTCCC2 controls or 1% (16) of 
parents by 50% reciprocal overlap with matching copy number status. In addition, 
similar to the filtering criteria used for array CGH detection, we selected only 
CNVs that contained less than 50% segmental duplication and intersected with 
RefSeq coding sequence. To select putative de novo CNVs, we further required the 
CNV not to be present in family-matched parents and siblings. Additionally, we 
filtered CNVs present in >0.1% (2) of the full 1,651 parent set. To select potential, 
rare inherited events, we required the CNV be detected in a matched parent or 
sibling. Finally, we filtered the genes inside each CNV under the same criteria (to 
account for smaller or larger CNPs) and removed CNVs with no remaining genes. 
CNV cross validation. High-confidence, cross-validated de novo and inherited 
CNVs were selected by identifying events detected by at least two of three 
methodologies. To account for the variable breakpoint definitions in array 
CGH, SNP arrays, and exome copy number profiles, we aligned the CNVs by at 
least one overlapping gene ID and reported each CNV region by its maximal outer 
boundaries. This identified six de novo and 70 rare inherited events for further 
study (Supplementary Table 6). 

Ingenuity pathway analysis. Ingenuity pathway analysis (IPA) was performed to 
identify potential functional enrichments within both our PPI (49 genes) and 
overall set of 126 genes. RefSeq reference gene list was used as a background list 
for all analysis. To confirm our results pertaining to CTNNB1 upstream enrich- 
ment, we simulated 10,000 random populations of 209 individuals using Poisson 
priors for each gene based on their estimated mutation rates (see below), with a 
global correction factor resulting in selecting a mean of 126 genes per population. 
We then used this simulation data to calculate the probability of observing eight 
direct upstream interactors of CTNNB1 and determined that our data set is 
enriched for these genes with P = 0.0030. 

Estimating locus-specific mutation rates. Human-chimpanzee alignments were 
downloaded from the UCSC Genome Browser (reference versions GRCb37 and 
panTro2, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsPanTro2/syntenicNet/). 
The more conservative syntenicNet alignments were used (details in http:// 
hgdownload.cse.ucsc.edu/goldenPath/hg19/vsPanTro2/README.txt). Gene defi- 
nitions were downloaded from the UCSC Table Browser, from the RefSeq Genes 
track, and the refFlat table. Exons were extended by 2 bp, and overlapping exons 
were merged using BEDTools. Non-exonic sequence was not considered. For each 
gene, we extracted: (1) d= the number of differences between chimpanzee and 
human; and (2) n = the number of bases aligned. We assumed a divergence time 
between human and chimpanzee of 12 million years (Myr) and an average genera- 
tion time of 25 years. We then calculated gene-specific mutation rates per site per 
generation: r = (d/n)/(12 Myr/25 years/generation). We calculated the probability 
of observing X+ events using the Poisson distribution defined by the number of 
chromosomes screened and the size of the coding region, including actual splice 
bases. 

Network simulation and null model estimation. To generate a null distribution 
of gene mutations, de novo mutation rates were estimated from human-chimp 
mutation rates. A pseudocount of 2.083310 ° (the smallest calculated in the 
gene set) was applied to any exon with a mutation rate of zero. To create null gene 
sets, genes were drawn uniformly from this background distribution. Human 
protein-protein interaction data were collected from GeneMANIA” on 29 
August 2011. Only direct physical interactions from the Homo sapiens database 
were considered. The list comprises approximately 1.5 million physical interac- 
tions, gathered from 150 studies. A protein interaction network was created from 
each experimental and null gene set by drawing edges between genes with physical 
interactions reported in the GeneMANIA database. Qualitatively similar results 
were achieved by including only interactions supported by multiple independent 
data sources. For each network, clustering coefficient, centralization, average shortest 
path length, density, and heterogeneity were determined using Cytoscape* and 
Network Analyzer**. Duplicate- and self-interactions were not considered in cal- 
culating network statistics. 

Disease gene prioritization based on PPI networks. We applied degree-aware 
algorithms to rank a set of candidate genes with respect to a set of products of genes 
associated with ASD using human PPI networks. We used the integrated human 
PPI network data collected from GeneMANIA” on 29 August 2011. The PPI 
network contains 12,007 proteins with ~1.5 million direct physical interactions 
associated with a reliability score. We obtain the seed proteins for the ASD from 
the list of ref. 17. For the candidate set we used 126 gene products from the severe 
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disruptive de novo events from the pilot autism project* and the current study. 
Given the GeneMANIA PPI network and Betancur seed gene product list, we used 
DADA” for ranking the candidate genes. We emphasize that this ranking is not 
implying causality but rather relatedness to genes previously and independently 
associated with ASD. For testing the significance of this ranking, we rank all the 
gene products except the seed set using the same algorithm. On the basis of the 
ranking result, we applied a Mann-Whitney U rank sum test (one-tailed) on the 
candidate set compared to all the other genes. 

MIP protocol. Each of 1,703 autism probands from the SSC collection and 744 
controls from the NIMH collection was subjected to MIP-based multiplex capture 
of the six genes: SCN1A, GRIN2B, GRIN2A, LAMC3, FOXP1 and FOXP2. For each 
library, 50 ng of DNA was used. Individually synthesized 70 mer MIPs (n = 355) 
were pooled and 5’ phosphorylated with T4 PNK (NEB). Hybridization with MIPs, 
gap filling and ligation were performed in one step for 45-48 h at 60 °C, followed by 
an exonuclease treatment of 30 min at 37 °C, similar to ref. 45, with modifications for 
reduced MIP number (B.J.O. et al., manuscript in preparation). Amplification of the 
library was performed by PCR using different barcoded primers for each library. 
Then barcoded libraries were pooled, purified using Agencourt AMPure XP and 
one lane of 101-bp paired-end reads was generated for each mega-pool (~384) on 
an Illumina HiSeq 2000 according to manufacturer’s instructions. Raw reads were 
mapped to the genome as in ref. 4. MIP targeting arms were then removed and 
variants called using SAMtools*. A 25-fold coverage, with AB allele ration <0.7, 
and quality 30 threshold was used for high-confident variant calling. Private 
(possible de novo) variants were identified by filtering against 1,779 other exomes. 
The parents of children with disruptive rare variants were then captured. Variants 
not seen or with low coverage in the parents were validated by Sanger capillary- 
based fluorescent sequencing. No truncating variants of GRIN2B were observed in 
the MIP sequenced controls or the Exome Variant Server ESP2500 release (NHLBI 
Exome Sequencing Project (ESP), Seattle, Washington, http://evs.gs.washington. 
edu/EVS/). 

Estimating the number of autism loci. The gene-level specificity of exome 
sequencing enables the estimation of the number of recurrently mutated genes 
implicated in the genetic aetiology of sporadic ASD. This question can be 
reformulated as the ‘unseen species problem’ (see ref. 46 for review and ref. 2 
for application to de novo CNVs discovered in autism), where genes with severe de 
novo events in probands are considered ‘observed species’, and binned by their 
frequency of appearance (that is, singletons, doubletons, etc.). We estimated the 
total number of genes implicated in autism (the total number of species) using 
several different estimators (implemented in the R package SPECIES, http:// 
www. jstatsoft.org/), as well as the formula provided in ref. 2. This estimate depends 
on the number of singletons and twin pairs of genes observed in probands, as well 
as the fraction of de novo events believed to be pathogenic for autism, that is, single, 
disruptive events that can cause autism on their own. We assumed that both of our 
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recurrent severe de novo events (affecting CHD8 and NTNG1) were pathogenic; 
these compose the entire set of twin pairs. The number of singletons is based on the 
estimated a priori fraction of the observed events that are pathogenic for autism. 
Across this sliding scale, the estimated number of loci is plotted in Supplementary 
Fig. 13. For example, using the estimator from ref. 47, if 20-50% of our de novo 
severe events are considered pathogenic, exome sequencing of a large number of 
additional samples would reveal between 182 and 992 pathogenic genes harbour- 
ing coding de novo point mutations (Supplementary Fig. 13); if all the observed 
severe de novo events in our experiment are included as pathogenic singletons, the 
number of implicated loci increases to more than 3,000. 
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Circadian clocks have evolved to synchronize physiology, meta- 
bolism and behaviour to the 24-h geophysical cycles of the 
Earth’. Drosophila melanogaster’s rhythmic locomotor behaviour 
provides the main phenotype for the identification of higher eukar- 
yotic clock genes~*. Under laboratory light-dark cycles, flies show 
enhanced activity before lights on and off signals**, and these 
anticipatory responses have defined the neuronal sites of the cor- 
responding morning (M) and evening (E) oscillators®’. However, 
the natural environment provides much richer cycling environ- 
mental stimuli than the laboratory, so we sought to examine fly 
locomotor rhythms in the wild. Here we show that several key 
laboratory-based assumptions about circadian behaviour are not 
supported by natural observations. These include the anticipation 
of light transitions, the midday ‘siesta’, the fly’s crepuscular activity, 
its nocturnal behaviour under moonlight, and the dominance of 
light stimuli over temperature. We also observe a third major 
locomotor component in addition to M and E, which we term ‘A’ 
(afternoon). Furthermore, we show that these natural rhythm phe- 
notypes can be observed in the laboratory by using realistic temper- 
ature and light cycle simulations. Our results suggest that a 
comprehensive re-examination of circadian behaviour and its 
molecular readouts under simulated natural conditions will provide 
a more authentic interpretation of the adaptive significance of this 
important rhythmic phenotype. Such studies should also help to 
clarify the underlying molecular and neuroanatomical substrates 
of the clock under natural protocols. 

Drosophila melanogaster provides a prominent model system in 
higher eukaryotes for studying the molecular and neurogenetic basis 
of circadian behavioural rhythms’. The fly’s 24-h rhythmic locomotor 
activity has morning (M) and evening (E) components, interpreted in 
numerous laboratory studies to reflect the fly’s anticipation of regular 
changes in light-dark transitions*® (Fig. 1a). Neurogenetic dissection 
has revealed discrete M and E neuronal clusters that determine these 
circadian landmarks®’. We wondered how the complex naturally cyc- 
ling geophysical environment might affect these behavioural patterns, 
so we investigated fly locomotor rhythms in wild-type and clock 
mutant strains under natural conditions, by placing activity monitors 
outdoors for three seasons (2007-2009), from April to November in 
Leicester, UK (latitude 52° 38’ N), and Treviso, Italy (45° 65’ N) (Sup- 
plementary Fig. 1). 

Generally, and throughout the seasons, clock mutants show very 
high levels of rhythmicity (85-100%) that reflect the cycling environ- 
ment driving (entraining) their behaviour, except Clk in which shows 
decreased rhythmicity (40%) at lower temperatures (Supplementary 
Table 1). Figure 1b-d illustrates locomotor profiles of a natural wild- 
type strain (WTALA) and clock mutants from summer observations 
in Treviso. Contrary to laboratory-based results, at warmer tempera- 
tures the ‘siesta’ (decreased activity during the hottest parts of the day; 


Fig. 1a)*"° was not observed. Instead, and in addition to the well- 
established M and E locomotor peaks, a prominent mid-afternoon 
component, which we term ‘A’, was present in more than 50% of 
wild-type flies at mean daytime temperatures of 20 °C, rising to about 
100% at 27°C (Fig. 1c). Laboratory studies in light-dark cycles at 
constant high temperatures of 29 °C (refs 8, 9), or in rectangular light 
and temperature cycles, do not reveal A components (Fig. la). This 
previously unobserved behaviour was seen in all genotypes at warmer 
temperatures (Fig. 1d and Supplementary Fig. 2) and may represent a 
stress/escape response that is phase-locked to the rise and fall in tem- 
perature. Alternatively, but not exclusively, A may reflect an environ- 
mentally modulated circadian phenotype. We observed that the A 
component was significantly advanced by up to 3h in short-period 
per® mutants in comparison with other per genotypes; per”’ was also 
advanced compared with per” and the wild type, and the A offset was 
significantly delayed in per’ compared with the other genotypes 
(Fig. le and Supplementary Fig. 2), suggesting that A is modulated 
by the clock and is not a simple escape reflex. 

Under laboratory 12 h light-12 h dark cycles, wild-type flies anticip- 
ate lights-on by increasing their locomotor activity about 2 h before the 
lights-on signal, in contrast with arrhythmic mutants such as per”’, 
which do not anticipate the signal®’, Consequently, in nature, if wild- 
type flies anticipate a dawn-related geophysical transition, clock 
mutants would be expected to ‘not anticipate’ and delay their activity 
accordingly. In nature, astronomical, nautical and civil twilights, 
defined by the angles of the Sun below the horizon (18°, 12° and 6°, 
reflecting illumination of approximately 0.001, 0.01 and about 1 lx, 
respectively), accompany the dawn-dusk transitions'’. We therefore 
used nautical twilight, which occurs roughly 1.5-2 h before dawn, as a 
convenient environmental marker by which to compare the onset of 
morning activity (Monset) among genotypes. Seasonally, the average 
daily Monset for WTALA flies in Leicester and Treviso reveals that the 
two locations differ in their overall profiles (Supplementary Fig. 3a). 
However, when the values are replotted against average night temper- 
ature, this difference evaporates, revealing a similar inverse relation- 
ship between Monset and temperature in three wild-type strains 
(Fig. 2a-c), clock mutants and Gal4-driven genotypes (Fig. 2d-j and 
Supplementary Fig. 4). The exception is tim”, for which Monset is only 
marginally temperature-dependent (Fig. 2d) and is rescued by trans- 
formation with the tim* gene, and in per stim”! double mutants 
(Fig. 2e, f). Ablating the Pigment Dispersing Factor (PDF)-expressing 
(Morning) neurons with UAS-hidUAS-rpr, or using the UAS-CycA 
dominant-negative to stop the clock in these cells, failed to disrupt 
the relationship between Monet and temperature (Supplementary 
Fig. 4). 

We investigated whether clock mutants ‘anticipate’ changes in twilight 
differently from the wild type, mirroring laboratory studies®’. We used 
an analysis of covariance (ANCOVA; temperature as covariate) to 
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Figure 1 | The afternoon (A) component of locomotor activity. a, An 
individual male’s (WTALA) locomotor activity in laboratory conditions with 16 h 
light-8 h dark and 22 °C in the dark phase and 30 °C in the light. M, morning; S, 
siesta; E, evening (yellow, light intensity; red, temperature). b, An individual 
male’s (WTALA) locomotor activity on a single summer’s day (yellow, light 
intensity; red, temperature; day temperatures: average 28.7 °C, maximum 34.4 °C, 
minimum 23.4 °C). M, morning; A, afternoon; E, evening. Time 0 = midnight 
(00:00). c, Afternoon peak is associated with temperature (R* = 0.34, F, 35 = 17.7; 
P<0.0002). d, Averaged locomotor plots (means + s.e.m.) for WT ALA and four 
arrhythmic mutants (see Supplementary Fig. 2 for other mutants). WTALA 

(n = 19), per®’ (n = 12) and Clk" (n = 27, with 3 arrhythmic flies) were 
monitored together (average temperature over experiment 28.0 °C, maximum 
34.4 °C, minimum 21.1 °C), whereas tim”! (n = 17) and the double mutant 
per”';tim” (n = 12) were monitored at a different time (average temperature 
29.3 °C, maximum 35.9 °C, minimum 23.1 °C). Red, temperature; yellow, light 
intensity. e, Afternoon component in per mutants compared with wild-type ALA 
(WTALA). Times of Agnset (green squares), Apeak (red circles) and A offset (blue 
triangles) of WTALA were subtracted from the corresponding values for the 
mutants because not all genotypes were run in the same experiment. Negative 
values indicate an earlier phenotype than WTALA. ANOVAs for all parameters 
gave significant genotype effects (F215; = 4.8 (onset), 8.9 (peak) and 29.8 (offset); 
all P< 0.0.01). In post hoc tests, per’ was significantly earlier in all parameters than 
per”, and significantly earlier than per” in peak and offset. per” was significantly 
delayed compared with per” in all parameters. 


inspect the average time from nautical twilight that each clock variant 
began morning activity, but only Pdf°, which was slightly delayed, 
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fell significantly outside the range of the three wild types (Fig. 2i and 
Supplementary Table 2a). These results suggest that clock genotypes 
(except tim°) generally anticipate geophysical variables equally. 
Alternatively, if Monset activities are simply temperature-sensitive 
responses to dim light levels, they should be delayed on the following 
morning if the activity monitors are covered up after the previous 
sunset. When we performed this experiment, significant delays of 
about 1h were observed for all genotypes (except tim”), with Monset 
now tracking the morning rise in temperature (Fig. 2k, 1, Sup- 
plementary Fig. 5 and Supplementary Table 2b). The attenuated 
tim” Monset twilight-modulated temperature responses (Fig. 2d and 
Supplementary Fig. 5a) recall the mutant’s disrupted diapause, a 
photoperiodic phenotype also requiring both light and temperature 
input’. We conclude that Monset is a twilight-dependent temperature 
response with little evidence for circadian regulation. 

We next examined evening onset (Egnset), which is seasonally 
delayed relative to the photoperiod during the warmer summer 
months (Supplementary Fig. 3b). We replotted Eonset against average 
daytime temperature, relative to the time of maximum daily temper- 
ature (Tinax3 Fig. 3 and Supplementary Fig. 6). All genotypes showed 
Eonset activities that began successively later as mean daily tempera- 
tures rose above 20 °C (Fig. 3 and Supplementary Fig. 6). At less than 
20°C, the response was stable in relation to T,,,,, with marginally 
significant negative relationships for WTALA, HU and Clk” strains 
(Supplementary Table 3a) and with Eonset generally occurring within 
the 2.25 + 0.17h (mean = s.e.m) that defined the time between the 
maximum daily light intensity and the subsequent Ti,.x (Fig. 3a-i). 
When analysed separately for temperatures below and above 20 °C, 
per® showed the earliest Eonset activities, followed by per” and tim’; 
per’ had the latest (Supplementary Table 3b, c). This is reflected in 
Fig. 3, where the data points fall either mostly above (later, per”) or 
below (earlier, per®, per’, tim”) the upper dotted line representing 
Tmax- These results mirror laboratory findings in which per® and per 
modulate the timing of evening behaviour’; in addition, however, the 
unexpected observation of E components in arrhythmic mutants, and 
their phase advance, indicates an underlying residual short-period 
rhythmicity"'*"°. 

In laboratory wild-type flies, the length of the temperature-sensitive 
afternoon ‘siesta’ is mediated by the thermal regulation of per 3’ splic- 
ing*’°. Eo yset defines the end of the siesta, so we studied whether the per 
splicing readout correlated with the Eonset curve. Levels of per 3’ splic- 
ing in fly heads throughout the seasons for WTALA flies (Fig. 3j) were 
linearly related to temperature over the entire range 7-30 °C, so it is 
unlikely that splicing contributed significantly to E,yset at temperatures 
below 20° C (Supplementary Fig. 6F). Indeed, because the laboratory 
siesta is mediated by the dynamics of the upswing of the clock protein 
PERIOD (PER)*, the fact that per®'-null mutants showed a similar 
Eonset-temperature relationship to that of the wild type also argues 
against a significant role for per 3’ splicing in this natural phenotype. 
A similar conclusion using splice-locked per transformants was 
reached in laboratory experiments using artificial temperature and 
light cycles’”. 

Drosophila activity rhythms in the laboratory are often described as 
‘crepuscular’ because of the distribution of M and E components at 
dawn and dusk'*. We therefore quantified the activity falling in the 
morning and evening twilights (astronomical, nautical and civil) as a 
proportion of all activity falling between the beginning of morning and 
the end of evening astronomical twilights. Using data from Italian 
experiments, we observed that even at the warmest temperatures when 
Monset ANd Eonser are pushed towards the twilights (Figs 1-3), the 
proportion of activity falling within the twilights was less than 25% 
and was reduced still further at cooler temperatures (Fig. 3k). Thus, in 
nature the major proportion of activity, even at high temperatures, falls 
outside the twilights, so flies are diurnal rather than crepuscular. 

Related laboratory experiments have also suggested that locomotor 
activity becomes predominantly crepuscular and nocturnal under 
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Figure 2 | The morning onset (Monset) of locomotor activity is dependent on 
temperature and twilight. a-j, Time of Monset in a single day compared with 
nautical twilight (dotted line, time 0). Blue and yellow areas represent the 
approximate time between astronomical and nautical twilight and between 
nautical twilight and sunrise, respectively (both calculated as seasonal averages 
for Treviso). Each point represents the mean of 4-24 flies (median 14); error 
bars represent s.e.m. Blue squares, Leicester; red circles, Treviso. a-c, Wild 


simulated moonlight and this correlates with delayed expression of the 
clock proteins PER and TIMELESS (TIM) in the fifth PDF-negative 
sLNv neuron’””®. We calculated the proportion of WTALA nocturnal 
activity during new moons and immediately before and after full 
moons. Figure 4a reveals no significant differences in nocturnal activity 
between the two conditions, even when we varied the exposure to 
moonlight from 0.03 to 0.33 Ix (Supplementary Fig. 7). When we ana- 
lysed PER in adult brains by immunocytochemistry under conditions 
of full moon and no moon, consistent with the behaviour, PER expres- 
sion in the fifth sLNv (Fig. 4b) was not significantly delayed in moon- 
light in comparison with the other lateral clock neurons (s-LNvs, 
1-LNvs and LNds). The dorsal DN1 and DN2 neurons showed signifi- 
cantly earlier PER peaks than the other neurons (Fig. 4b and 
Supplementary Fig. 8), indicative of a faster underlying oscillation”. 
These results suggest that natural temperature and light cycles may 
counteract the delayed PER expression of the fifth sLNv observed 
under simulated moonlight. 

Finally, we attempted to reproduce natural circadian phenotypes 
within the laboratory by simulating cycling temperature and light 
intensity. We simulated consecutive days of new moon (darkness at 
night), with astronomical twilight (0.005 lx) giving way gradually to 


types: WTALA (Alto Adige) (a), Canton S (b) and HU (Houten) (c). d-j, Clock 
mutants: tim”! (d), tim”!;P[tim*] (e), per” ;tim®! (f), per"! (g), Clk" (h), Pdf? 
(i) and cry”’ (j) (Supplementary Fig. 4). The slope for tim”’ is significantly 
different from that for all three wild types (Supplementary Table 2a). k, 1, Monset 
of wild type (k) and per” mutant (1) before (red) and after (black) activity 
monitors were covered at sunset and maintained until mid-afternoon 
(Supplementary Fig. 5 and Supplementary Table 2b). 


nautical twilight (0.05 Lx), to civil twilight (1 Lx) over a 2-h period at the 
beginning and (in reverse) at the end of the day, rising to a maximum 
intensity of 1,500 lx in the afternoon. We ran this at cycling day-night 
temperatures ranging from 20 to 30°C (spring) or from 25 to 35 °C 
(summer), mimicking actual temperatures that we measured in two 
Treviso spring and summer days, respectively, when the photoperiod 
for both was roughly 16h light-8h dark. We observed that in the 
summer simulation, M, A and E components were prominently dis- 
played by all genotypes (WTALA, per mutants, tim®’ and Clk"), 
although Clk”* showed a smaller E burst on most days (Fig. 4c and 
Supplementary Fig. 9). We confirmed that Monset occurs between the 
simulated times of astronomical and nautical twilight, as in the wild at 
these temperatures, with all genotypes becoming active 0.2-0.7h 
before simulated summer nautical twilight, but later (between 0.1 
and 0.4h) before simulated spring twilight (Supplementary Fig. 9c). 
As in the wild, Monset is temperature-sensitive, with no significant 
differences between the wild type and either per” or tim®’ mutants 
in the timing on Monset (although per* and pes were slightly earlier in 
their Monset for both spring and summer simulations), and with tim”! 
showing a similar temperature independence to that in the wild 
(Supplementary Fig. 9c). We observed the circadian modulation of 
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Figure 3 | Evening onset is temperature and clock modulated. a-i, Evening 
onset compared with daily maximum temperature (T;,ax, 0). Dotted line below 
Tmax represents the average time that light peaked (about 2.25 h) before Tmax. 
Blue squares, Leicester; red circles, Treviso. a-c, Wild types: WTALA (Alto 
Adige) (a), Canton S (b) and HU (Houten) (c). d-i, Clock mutants: per”! 

(d), per” (e), per (f), tim”! (g), per” stim” (h) and Clk (i) (see also 
Supplementary Fig. 6 and Supplementary Table 3). Most data points below 
20 °C fall between the times of the light and temperature peaks. per”, tim’! and 
pers have earlier Eonset (ANCOVA: P ~ 0, 0.003 and 0.0007, respectively, versus 
WTALA) as shown by the lower scores on the yaxis; per” has later onset 

(P = 0.0003 versus wild-type). bs per splicing in nature in fly heads. per”? nee 
isoform compared with per’ (per?! + per’™?"4) against average daily 
temperature in WTALA strain. Heads collected at 00:00 (black squares), 03:00 
(red circles), 12:00 (blue triangles) and 15:00 (green triangles). k, Locomotor 
activity is neither crepuscular in natural conditions nor in simulated natural 
conditions. Red circles, mean + s.e.m. of daily crepuscular activity in natural 
condition of groups of WTALA flies in Italy from single multi-day experiments; 
black squares, Italian spring and summer laboratory simulations (see Fig. 4c). 


the E component in our simulations (Supplementary Fig. 9e) and 
whereas nearly all flies, irrespective of clock genotype, showed a prom- 
inent A component in the summer simulation, this proportion fell to 
20-50% in the spring simulation, matching our results from nature and 
revealing the importance of temperature for the expression of A 
(Fig. 4c and Supplementary Fig. 9b, d). We also observed similar pro- 
portions of crepuscular activity for the spring and summer simulations 
to those in our natural observations (Fig. 3k). We did not, however, 
reproduce the advance in the A component of per® compared with the 
wild type that we observed in nature (Supplementary Fig. 9d). 

Our results from the wild reveal a diminished role for ‘anticipation’ 
of light-dark transitions partly because arrhythmic mutants also show 
M and E components, and also because the M component appears to 
be a response to changes in the environment. However, the newly 
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Figure 4 | Diurnal phenotypes of D. melanogaster. a, The proportion of 
nocturnal locomotor activity of WTALA males does not differ between days of 
full moon (open circles) and no moon (filled circles) (ANCOVA: F, 32 = 0.04, 
P= 0.84). Results are means + s.e.m. and are based on 4-24 flies (median 17). 
b, PER immunofluorescence intensity (mean + s.e.m., arbitrary units) in clock 
neurons in May 2008 (left) and September 2008 (right) (new and full moon, 
respectively, corresponding to filled and open circles in a). Red, fifth Pdf-null 
sLNv; brown, sLNvs; green, LNds; turquoise, |-LNvs. DN1 neurons (black) and 
DN2 neurons (purple) are significantly advanced (Fs,49g = 13.71, P ~ 0; see also 
Supplementary Fig. 8). There is a significant advance for all neurons in 
September in comparison with May (Fj,493 = 34.1, P~ 0) but no 

month X neuron interaction (Fs,493 = 0.79, P = 0.56). May, lights on at 05:16, 
lights off at 20:59, photoperiod 15 h 43 min; September, lights on at 06:20, lights 
off at 19:50, photoperiod 13 h 30 min. c, Simulation of Italian summer in the 
laboratory for the wild type and for arrhythmic mutants (16h light-8 h dark, 
temperature range 25-35 °C; see also Supplementary Fig. 9). n = 19-32 for 
each genotype at each temperature. 


described A and the E components are clock-modulated, unlike M. 
This may be because M occurs during the rapid changes in illumina- 
tion from quasi-darkness to bright light, when temperature is constant 
at its daily minimum, whereas A and E components occur when the 
light intensity is peaking or falling rapidly but temperature is either still 
rising or falling gradually. Such conflicting environmental signals may 
have recruited the clock to regulate appropriately adaptive A and E 
locomotor responses. Our results also show that temperature is the 
critical variable for predicting circadian behaviour’’, because the large 
differences in photoperiod between our two locations (more than 3h 
in midsummer) contribute little to the behavioural variance. In con- 
trast, laboratory studies that place temperature and light cycles in 
different phases to each other” reveal that light is the most important 
driver of circadian behaviour. 

Although lateral and dorsal circadian neurons underlying M and E 
behaviour in light-dark cycles at constant temperature have been 
identified®’, subsets of dorsal neurons and lateral posterior neurons 
may be relevant for circadian temperature signalling’ *°. Perhaps in 
the wild it is these neurons that dominate the neuronal circuit. 
However, our observation that various ‘clockless’ genotypes can 
nevertheless show M and E (and A) components also suggests that 
the underlying neuronal substrates for this behavioural programme 
may not reside solely in the lateral and dorsal neurons but also 
elsewhere in the brain. Clock gene expression enhances and modulates 
circadian behaviour in the wild, but the possibility remains that a 
residual short-period rhythmicity resides within per and tim ‘arrhyth- 
mic’ mutants*’*”* that can be amplified under natural conditions. Our 
results resonate with studies of laboratory mice”” and hamsters”, 
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which show them to behave quite differently in nature, where they are 
predominantly diurnal and not nocturnal as in the laboratory. 
Similarly, generally accepted adaptive and mechanistic explanations 
for fly circadian behaviour from laboratory experiments may require 
some revision if they are to account for rhythmicity in nature. 


METHODS SUMMARY 

Fly strains. Strains WTALA and HU are natural isolates collected as isofemale 
lines from northern Italy and The Netherlands’*. Canton-S is a standard laboratory 
strain. per”’, per® and per” are congenic and Cantonized, but the genetic back- 
grounds of the other mutants used per” ;tim”, Clk, Pdf a cry? and Gal4-driven 
transgenes (drivers tim-gal4 and Pdf-gal4), the apoptotic UAS-hidUAS-rpr and the 
dominant-negative UAS-CycA (ref. 29) are not equivalent, and for most are 
unknown. 

Behaviour. Flies were raised at 25 °C in 12 h light-12 h dark cycles in the laboratory 
before males were placed outside at 2-3 days of age in Trikinetics activity monitors. 
Morning (Monset) and afternoon or evening (Aonset» Eonset) onsets were determined 
operationally for each fly on each day of every experiment. Circadian rhythmicity 
was assessed by spectral analysis and autocorrelation as reported previously*”. 
Moonlight measurements. Measurements of moonlight were made in Treviso 
and Leicester, using a LI-210SA Photometric Sensor (LI-COR) connected to a 
LI 1400 data logger. 

Twilights. Twilights were obtained from the online database of the United States 
Naval Observatory (USNO) Astronomy Application Department”’. 
Immunocytochemistry. The slides were examined under a Nikon 80i microscope 
with QICAM Fast Camera. Individual images were taken of planes at different 
depths to create a z-series for each brain. Brains were stained with primary antibodies 
against PDF and PER (the latter was a gift from R. Stanewsky). 

RNA quantification. Flies were collected every 3h for a 24-h time course in 
natural conditions, and fixed in liquid nitrogen. Heads were removed and per” oe. 
per?" forms and the control gene Cbp20 (Cap binding protein 20) were amp- 
lified by PCR. PCR products were revealed on a 2% agarose gel and imaged, and 
final quantification was obtained with ImageJ software. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Fly strains. Strains WTALA and HU are natural isolates collected as isofemale 
lines from northern Italy (Alto Adige 46° 30’ N) in 2004 and from Houton, The 
Netherlands (52° 2’ N) in 2004, respectively. Canton-S is a standard laboratory 
strain. per”, per® and per” are congenic and were backcrossed repeatedly to a 
Cantonized per deficiency Df(1)64j4 background by C.P.K. in the early 1980s. 
The genetic backgrounds of the other fly strains used are varied, and for most 
they are unknown. 

In the laboratory, even at a constant 10 °C, we observed that UAS-CycA (ref. 29) 

was effectively driven by the timgal4 driver and gave an arrhythmic locomotor 
profile compared to the control strain (Supplementary Fig. 10). All flies were raised 
to adulthood at 25 °C in the laboratory before being placed outside at 2-3 days of 
age. We confirmed that UAS-hidUAS-rpr was effective in eliminating the clock 
neurons under tim- and Pdf-gal4 control using immunocytochemistry (data not 
shown). 
Locomotor activity. Locomotor activity was monitored in suburban gardens well 
away from street lighting and from any major roads. Flies were placed in 
Trikinetics activity monitors, and shielded from direct sunlight and from rain 
(Supplementary Fig. 1). In Leicester this was accomplished by placing the monitors 
ona table within a child’s plastic playhouse (Toys R Us) with approximate internal 
dimensions of 4 X 4 X 4 ft?, with open windows in a north-south direction that 
was shaded by an adjacent large tree. Bricks placed on the roof ensured that the 
house did not blow away during high winds. Cables from monitors were fed from 
the back of the playhouse into a computer that was placed inside an immediately 
adjacent extension to C.P.K.’s house. In Italy, monitors were placed under a 
penthouse roof in a garden roughly 4km from the city centre of Treviso, again 
with no direct sunlight, with cables fed to a computer within the penthouse. Flies 
were monitored from March to November (Treviso) or from May to September 
(Leicester). Simultaneous and continuous monitoring of temperature, light 
intensity and humidity were made with recorders placed adjacent to the activity 
tubes. Circadian rhythmicity was assessed by spectral analysis and autocorrelation 
as reported previously”’. 

Morning (Monset) and afternoon or evening (Aonset» Eonset) onsets were deter- 
mined for each fly on each day of every experiment (see Supplementary Fig. 11 for 
examples). Monset Was considered to be present when a bout of activity occurred 
after a period of rest during the dark phase, if continuous movement with a steady 
increase in activity (and no more than one half-hour time bin without any or lower 
activity than the previous bin) led to a peak, followed by a steady decrease in 
activity defining the offset. Activity bouts in the middle of the night (not leading 
continuously into the morning) were not considered. In general, the window of 
time that contained the morning bout was 3h before or after the 30-min time bin 
during which the light intensity reached 11x. If there was no bout of activity 
consistent with the parameters described above, it was noted that there was no 
morning component for that fly on that day. 

Eonset Was considered to be present when a bout of activity occurred after a 
period of rest during the day. If a burst of activity occurred in the middle of the 
afternoon on warmer days, with another clear bout after it, the latter was con- 
sidered to represent the Eonset, whereas the former represented the afternoon 
component. The evening and afternoon bouts of activity had to be composed of 
continuous movement with no more than one zero activity bin interspersed 
within, and with a steady increase and decrease in activity levels defining the 
onset/offset. If there was no bout of activity consistent with the parameters 
described above, it was noted that there was no evening peak for that fly on that 
day. 

Moonlight measurements. Measurements of moonlight were made in Treviso 
and Leicester, using a LI-210SA Photometric Sensor (LI-COR) connected to a 
LI 1400 data logger. Five-minute readings were made four times during selected 
nights of full moon or new moon, and the minimum and maximum light levels 
were recorded, both in fully exposed full moonlight conditions and in the more 
sheltered positions in which the Trikinetics monitors were normally maintained. 
Average maximum nightly full moonlight levels were 0.23 + 0.151x in fully 
exposed positions, in comparison with 0.07 + 0.01 lx in sheltered conditions in 
Leicester (30 November 2009). The corresponding values in Treviso were 
0.33 + 0.07 lx in exposed positions and 0.03 + 0.01 Ix in sheltered conditions. At 
new moon, in both Leicester and Treviso, the light values were 0 Ix. 

Twilights. Twilights were obtained from the online database of the United States 
Naval Observatory (USNO) Astronomy Application Department". 


Natural temperature profiles. These were reproduced in the laboratory by using 
the Memmert ICP 700 incubator with a refrigeration unit. The Celsius 2007 
computer program, running under the Windows XP operating system, was used 
for programming, controlling and documenting the Memmert ICP 700 using the 
USB interface. The incubator can generate smooth cycling temperature ranges 
without step-ups. 

Natural light profiles. These were simulated by using a programmable daylight 
simulator based on a battery of white-light LEDs of different intensities that could be 
programmed to generate smooth cycling profiles (without step-ups) of light ranging 
from 0.0051x (astronomical twilight) to 1,500lx. The front-end programming 
allowed the simulation of any observed natural cycle of light intensity. The equip- 
ment was designed and built by EURITMI, a spin-out from the Electronic 
Engineering Unit at the Venetian Institute of Molecular Medicine (Padova, Italy). 
Immunocytochemistry. Flies were collected every 3h for a 24-h time course in 
natural conditions and fixed for 3 h in 4% paraformaldehyde. After three washes 
(15 min each) in PBS, the brains were dissected under a stereomicroscope (Leica 
MB6). Brains were washed five times for 6 min each in 0.3% PBST (PBS + 0.3% 
Triton X-100). They were then permeabilized for 10 min in 1% PBST and blocked 
for 2h with 1% BSA. After blocking, the brains were incubated for 3 days at 4 °C, in 
the primary antibody diluted in 0.1% BSA in 0.3% PBST. They were then washed 
five times for 6 min each in 0.3% PBST. After washing, they were blocked again for 
1h with 1% BSA. The brains were then incubated overnight at 4 °C in the appro- 
priate secondary antibody. The antibodies used for the immunocytochemistry 
experiments were anti-PDF (Developmental Studies Hybridoma Bank, dilution 
of 1:5,000), anti-PER (from R. Stanewsky; 1:2,500), and anti-mouse and anti- 
rabbit (both from Invitrogen; 1:500). 

Brains were mounted onto slides (VWR) with a drop of mounting medium 
(Vectashield; Vecta Laboratories, Inc.) and covered with a coverslip 0.1 mm thick 
(VWR). The slides were observed under a Nikon 80i microscope with a QICAM 
Fast Camera. Individual images were taken of planes at different depths to create a 
z-series for each brain. The size of the sections forming a z-series was 1.0 + 0.2 um. 
The images were viewed and quantified with ImageJ version 1.42g (http://rsb.in- 
fo.nih.gov/ij/). The average pixel intensity for each neuron was measured together 
with the signal from its corresponding background area. The final amount of 
signal was calculated using the formula intensity = 100 X (signal — back- 
ground)/background. The values of intensity obtained were normalized using 
the higher value as 100% and then plotted with Microsoft Excel 2003 or 
OriginPro 8.0, and statistical analyses were performed with Statistica 8 (StatSoft). 
RNA quantification. Flies were collected every 3h for a 24-h time course in 
natural conditions and fixed in liquid nitrogen. Heads were removed and total 
RNA was recovered and extracted using Trizol Reagent (Gibco) as recommended 
by the kit protocol. Reverse transcription was initiated on total RNA, after any 
contaminating genomic DNA was removed with DNase (Promega), using the 
SuperscriptII (Invitrogen) reverse transcriptase with the 17-bases oligo(dT). The 
reaction was performed for 1 hat 42 °C and for 15 minat 75 °C. per?!"*, per'™Pleed 
forms and the control gene Cbp20 (Cap binding protein 20) were amplified by PCR 
with a PTC-100 Peltier Thermal Cycler (MJ Research) using primers listed below. 
PCR was performed as follows: initial denaturation at 95°C for 3 min, then 28 
cycles consisting of 95°C for 1 min, 62.1°C for 1 min, and 72°C for 45s. The 
reaction was completed by an elongation step of 10 min at 72 °C. Amplifications 
were carried out in 20-l reaction mixtures containing 75 ng of cDNA target, 1 pil 
of each primer (10 1M), 1.6 pl of dNTPs (2mM), 4 ul of Green Buffer (5X) and 
0.4 ul of GoTaq DNA polymerase (Promega). Primers used were as follows: per, 
5'-AAGACGGAGCCGGGCTCCAG-3’ (forward; base pairs 6421-6440; NCBI 
X03636) and 5'-TCTACATTATCCTCGGCTTGC-3’ (reverse; base pairs 7201- 
7221; NCBI X03636); cbp20, 5'-GTCTGATTCGTGTGGACTGG-3’ (forward; 
base pairs 540-559; FlyBase ID FBgn0022943) and 5'-CAACAGTTTGCCATA 
ACCCC-3’ (reverse; base pairs 653-672; FlyBase ID FBgn0022943). 

PCR products were revealed on a 2% agarose (Eurobio) gel under ultraviolet 
radiation. Images were collected with the Quantity One 4.6 (Bio-Rad). The final 
quantification was obtained with Image] software by applying the formula 
intensityRN (pixel) = band intensity of per spliced or per unspliced/band intensity 
of cbp20. Each band’s intensity was corrected for background noise. For each 
collection, three independent RNA extractions, reverse transcriptions, amplifica- 
tions and quantifications were performed. The results were normalized and ana- 
lysed with Statistica8 (StatSoft). The plots were obtained with OriginPro 8.0 
software. 
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Autism spectrum disorders (ASD) are believed to have genetic and 
environmental origins, yet in only a modest fraction of individuals 
can specific causes be identified’”. To identify further genetic risk 
factors, here we assess the role of de novo mutations in ASD by 
sequencing the exomes of ASD cases and their parents (n= 175 
trios). Fewer than half of the cases (46.3%) carry a missense or 
nonsense de novo variant, and the overall rate of mutation is only 
modestly higher than the expected rate. In contrast, the proteins 
encoded by genes that harboured de novo missense or nonsense 
mutations showed a higher degree of connectivity among themselves 
and to previous ASD genes’ as indexed by protein-protein inter- 
action screens. The small increase in the rate of de novo events, when 
taken together with the protein interaction results, are consistent 
with an important but limited role for de novo point mutations in 
ASD, similar to that documented for de novo copy number variants. 
Genetic models incorporating these data indicate that most of the 
observed de novo events are unconnected to ASD; those that do 
confer risk are distributed across many genes and are incompletely 
penetrant (that is, not necessarily sufficient for disease). Our results 
support polygenic models in which spontaneous coding mutations 
in any of a large number of genes increases risk by 5- to 20-fold. 
Despite the challenge posed by such models, results from de novo 
events and a large parallel case-control study provide strong 
evidence in favour of CHD8 and KATNAL2 as genuine autism 
risk factors. 

In spite of the substantial heritability, few genetic risk factors for 
ASD have been identified’*. Copy number variants (CNVs), in par- 
ticular de novo and large events spanning multiple genes, have been 
identified as conferring risk**. Although these CNVs provide import- 
ant leads to underlying biology, they rarely implicate single genes, are 
rarely fully penetrant, and many confer risk to a broad range of con- 
ditions including intellectual disability, epilepsy and schizophrenia’. 
There are also documented instances of rare single nucleotide variants 
(SNVs) that are highly penetrant for ASD*. 

Large-scale genetic studies make clear that the origins of ASD risk 
are multifarious, and recent estimates based on CNV data put the 


number of independent risk loci in the hundreds°. Yet knowledge 
regarding specific risk-determining genes and the overall genetic 
architecture for ASD remains incomplete. Although new sequencing 
technologies provide a catalogue of most variation in the genome, the 
profound locus heterogeneity of ASD makes it challenging to distin- 
guish variants that confer risk from the background noise of in- 
consequential SNVs. De novo variation, being less frequent and 
potentially more deleterious, could offer insights into risk-determining 
genes. Accordingly, we sought to evaluate carefully the observed rate 
and consequence of de novo point mutations in the exomes of ASD 
subjects. 

We performed exome sequencing of 175 ASD probands and their 
parents across five centres with multiple protocols and validation 
techniques (Supplementary Information). We used a sensitive and 
specific analytical pipeline based on current best practices’° to analyse 
all data and observed no heterogeneity of mutation rate across centres. 

In the entire sample, we observed 161 coding region point muta- 
tions (101 missense, 50 silent and 10 nonsense), with an additional two 
conserved splice site (CSS) SNVs and six frameshift insertions/ 
deletions (indels) validated and included in pathway analyses 
(Supplementary Table 1). 

To determine whether the rate of coding region point mutations was 
elevated, we estimated the mutation rate in light of coverage and base 
context using two parallel approaches (Supplementary Information). 
On the basis of both models, the exome target should have a signifi- 
cantly increased (~30%) mutation rate compared to the genome. 
Conservatively, by assuming the low end of the estimated mutation 
rate from recent whole-genome data (1.2 10 °)!°, we estimate a 
mutation rate of 1.5 10 ° for the exome sequence captured here. 
The observed point mutation rate of 0.92 per exome is slightly but not 
significantly elevated versus expectation (Table 1) and is insensitive to 
adjustment for lower coverage regions (Supplementary Information). 
Indeed our rate is similar to that of ref. 11. 

Per-family events were distributed exquisitely according to the 
Poisson distribution (Table 1), suggesting limited variation in the 
underlying rate of de novo mutation in ASD families. The relative rates 
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Table 1 | Distribution of events per family 


Events per All ASD trios Random 
family mut. exp. 
Exon DN SNVs* Exp.t 

0 71 69.7 73.2 

1 62 64.2 63.8 

2 28 29.5 278 

3 10 9.1 8.1 

4 2 Zl 18 

5 1 0.4 0.3 
Mean 0.920 0.871 


* Exon DN SNVs include all single nucleotide variants in coding sequence but excludes indels and 
intronic variants. 

+The expected distribution of number of trios with a given event count as determined by the Poisson. 
=~ Random mut. exp. is the expectation for 175 trios based on the sequence-context mutation rate model 
M1 (Supplementary Information) based on the count of the number of trios that have at least 10 x 
coverage. 


of ‘functional’ (missense, nonsense, CSS and read-through) versus 
silent changes did not deviate from expectation (Table 2). We did, 
however, observe ten nonsense mutations (6.2%), which exceeded 
expectation (3.3%) (one-tailed P = 0.04; Supplementary Information). 

We examined missense mutations using PolyPhen-2 scores’* to 
measure severity, as some missense variants can severely affect func- 
tion’’. These scores showed no deviation from random expectation. 
The observed PolyPhen-2 scores clearly deviate from standing vari- 
ation in the parents (Table 2), but such variation, even the rarest 
category, has survived selective pressure and so is inappropriate for 
comparison to de novo events. 

We observed three genes with two de novo mutations: BRCA2 (two 
missense), FAT1 (two missense) and KCNMAI1 (one missense, one 
silent). A gene with two or more non-synonymous de novo hits across 
a panel of trios might indicate strong candidacy. However, simulations 
(Supplementary Information) show that two such hits are inadequate 
to define a gene as a conclusive risk factor given the number of 
observed events in the study. 

From analyses of secondary phenotypes (Supplementary Tables 2 
and 3), the most striking result is that paternal and maternal age, 
themselves highly correlated (7 = 0.679, P-value <0.0001), each 
strongly predicts the number of de novo events per offspring (paternal 
age, P = 0.0013; maternal age, P = 0.000365), consistent with aggreg- 
ating mutations in germ cells in the paternal line'*. Consistent with a 
liability threshold model, there is an increased rate of de novo mutation 
in female versus male cases (1.214 for females versus 0.914 for males); 
however, the difference is not significant, owing to limited sample size. 
Considering phenotypic correlates, we observed no rate difference 
between subjects with strict autism versus those with a broader ASD 
classification, between positive and negative family history, or any 
significant effect of de novo mutation on verbal, non-verbal or full- 
scale IQ (Supplementary Table 3). 

Given that hundreds of loci are apparently involved in autism* and 
de novo mutations therein affect ASD risk, we modelled different 
numbers of risk genes and penetrances (Supplementary Informa- 
tion) and show that a model of hundreds of genes with high penetrance 
mutations is excluded by our data; however, more modest contribu- 
tions of de novo variants are not. For example, up to 20% of cases 


Table 2 | Rates of mutation annotation given variant type 


Type of de novo De novo Random Singletons Doubletons =3 
mutation (%)* de novo (%) (%)* (%)t (Ht 
Missense 62.7 66.1 59.5 55.4 48.8 
Nonsense 6.2 3.8 12 0.8 0.4 
Synonymous 31,1 30.6 39.3 43.8 50.8 
PolyPhen-2 missense classification 

Benign 35.0 35.9 46.6 513 63.4 
Possibly 21.0 18.9 18.8 Ly7 15.1 
damaging 

Probably 44.0 45.2 34.7 31.0 21.4 
damaging 


* All indels and failing variants were removed. 
+ Singletons, doubletons and =3 (copies) are only those variants called in 192 parents. 


2 | NATURE | VOL 000 | 00 MONTH 2012 


carrying a de novo event conferring a 10- or 20-fold increased risk is 
consistent with these data (Supplementary Table 4). Thus, our data are 
consistent with either chance mutation or a modest role for de novo 
mutations on risk. Importantly, a single deleterious event is unlikely to 
fully explain disease in a patient. 

We therefore posed two questions of the group of genes harbouring 
de novo functional mutations: do the protein products of these genes 
interact with each other more than expected, and are they unusually 
enriched in, or connected to, previous curated lists of ASD-implicated 
genes? Using an in silico approach (DAPPLE)”, the protein-protein 
connectivity defined by InWeb"* in the set of 113 genes harbouring 
functional de novo mutations was evaluated. These analyses (Fig. 1) 
showed significantly greater connectivity among the de novo identified 
proteins than would be expected by chance (P < 0.001) (Supplemen- 
tary Information). 

Querying previously defined, manually curated lists of genes’ asso- 
ciated with high risk for ASD with or without intellectual disability 
(Supplementary Table 5), and high-risk intellectual disability genes 
(Supplementary Table 6), we asked whether there was significant 
enrichment for de novo mutations in these genes. Five genes with 
functional de novo events were previously associated with ASD and/ 
or intellectual disability (STXBP1, MEF2C, KIRREL3, RELN and 
TUBA1A); for four of these genes (all but RELN) the previous evidence 
indicated autosomal dominant inheritance. 

We then assessed the average distance (D;, Supplementary Fig. 2) of 
the de novo coding variants in brain-expressed genes (see supplement) 
to the ASD/intellectual disability list using a protein-protein inter- 
action background network. To enhance power, data from a compan- 
ion study" were used, including the observed silent de novo variants 
and de novo variants in unaffected siblings as comparators. The average 
distance for non-synonymous variants was significantly smaller for the 
case set than the comparator set (3.66 + 0.42 versus 3.78 + 0.59; 
permutation P = 0.033) (Supplementary Fig. 3). Much of this signal 
comes from 31 synaptic genes identified by three large-scale synaptic 
proteomic studies (D; = 3.47 + 0.46 versus 3.57 + 0.60; permutation 
P = 0.084) (Fig. 2; see also Supplementary Fig. 4 for the complete data). 
Taken in total, these independent gene set analyses, along with the 
modest enrichment of de novo variants over background rates in 
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Figure 1 | Protein-protein interaction for genes with an observed 
functional de novo event. Direct protein connections from InWeb, restricting 
to genes harbouring de novo mutations for DAPPLE analysis. Two extensive 
networks are identified: the first is centred on SMARCC2 with 12 connections 
across 11 genes; the second is centred on FN1 with 7 connections across 6 genes. 
The P value for each gene having as many connections as those observed is 
indicated by node colour. 
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Figure 2 | Direct and indirect protein-protein interaction for genes with a 
functional de novo event and previous ASD genes. PPI network analysis for 
de novo variants and 31 previous synaptic ASD genes (see Supplementary 
Information). Nodes are sized based on connectivity. Genes harbouring de novo 
variants (left) and previous ASD genes (right) are coloured blue, with dark blue 
nodes representing genes that belong to one of these lists and are also 


ASD, indicate that a proportion of the de novo events observed in this 
study probably contribute to autism risk. 

Using whole-exome sequencing of autism trios, we demonstrate a 
rate, functional distribution and predicted impact of de novo mutation 
largely consistent with chance mutational processes governed by 
sequence context. This lack of significant deviation from random 
mutational processes indicates a more limited role for the contribution 
of de novo mutations to ASD pathogenesis than has previously been 
suggested”’, and specifically highlights the fact that observing a single 
de novo mutation, even an apparently ‘severe’ loss-of-function allele, is 
insufficient to implicate a gene as a risk factor. Yet the pathway 
analyses presented here assert that the overall set of genes hit with 
functional de novo mutations is not random and that these genes are 
biologically related to each other and to previously identified ASD/ 
intellectual disability candidate genes. Modelling the de novo muta- 
tional process under a range of genetic models reveals that some 
models are inconsistent with the observed data—for example, 100 rare, 
fully penetrant Mendelian genes similar to Rett’s syndrome—whereas 
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intermediate proteins. Intermediate proteins (centre) are coloured in shades of 
orange based on a P value computed using a proportion test, where a darker 
colour represents a lower P value. Green edges represent direct connections 
between genes harbouring de novo variants (left) and previous ASD genes. All 
other edges, connecting to intermediate proteins, are shown in grey. 


others are not inconsistent, such as spontaneous ‘functional’ mutation 
in hundreds of genes that would increase risk by 10- or 20-fold 
(Supplementary Table 4). Models that fit the data are consistent with 
the relative risks estimated for most de novo CNVs° and suggest that de 
novo SNVs, like most CNVs, often combine with other risk factors 
rather than fully cause disease. Furthermore, these models indicate 
that de novo SNV events will probably explain <5% of the overall 
variance in autism risk (Supplementary Table 4). 

Considering the two companion papers''”’, 18 genes with two func- 
tional de novo mutations are observed in the complete data. Using 
simulations, 11.91 genes on average harbour functional mutations 
by chance (Supplementary Table 7). Thus, a set of 18 genes with two 
or more hits is not quite significant (P = 0.063). Matching loss-of- 
function variants, however, at SCN2A, KATNAL2 and CHD8 (Sup- 
plementary Table 7) are unlikely to occur by chance because of the 
expected very low rate of de novo nonsense, splice and frameshift 
variants. We evaluated these strong candidates further using exome 
sequencing on 935 cases and 870 controls, and at both KATNAL2 and 


00 MONTH 2012 | VOL 000 | NATURE | 3 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


CHD8 three additional loss-of-function mutations were observed in 
cases with none in controls. No additional loss-of-function mutations 
were seen at SCN2A in the case-control data, but a new splice site de 
novo event has been validated in an additional autism case while this 
paper was in press, strengthening the evidence for this gene as relevant 
to autism. Using data from more than 5,000 individuals in the NHLBI 
Exome Variant Server (http://evs.gs.washington.edu/EVS/) as addi- 
tional controls, three loss-of-function mutations were seen in 
KATNAL2 but none in CHD8, making the additional observation of 
three CHD8 loss-of-function mutations in our cases significant evid- 
ence (P< 0.01) of this being a genuine autism susceptibility gene. Not 
all genes with double hits are nearly so promising (Supplementary 
Information and Supplementary Tables 8 and 9), supporting the 
estimate above that most of such observations are simply chance 
events. Overall, these data underscore the challenge of establishing 
individual genes as conclusive risk factors for ASD, a challenge that 
will require larger sample sizes and deeper analytical integration with 
inherited variation. 


METHODS SUMMARY 

We ascertained probands using the Autism Diagnostic Interview-Revised (ADI- 
R), the Autism Diagnostic Observation Schedule-Generic (ADOS) and the DSM- 
IV diagnosis of a pervasive developmental disorder. All probands met criteria for 
autism on the ADI-R and either autism or ASD on the ADOS, except for the three 
subjects that were not assessed with the ADOS. All subjects provided informed 
consent and the research was approved by institutional human subjects boards. 

For 175 trios, we performed exome capture and sequencing using either the 
Agilent 38Mb SureSelect v2 (n = 118), the NimbleGen Seq Cap EZ SR v2 (n = 51), 
or NimbleGen VCRome 2.1 (Baylor n = 6). After capture, another round of LM- 
PCR was performed to increase the quantity of DNA available for sequencing. All 
libraries were sequenced using an IlluminaHiSeq2000. 

All sequence data were processed with Picard (http://picard.sourceforge.net/), 
which recalibrates quality scores and local realignment at known indels* and 
BWA’ for mapping reads to hg19. SNPs were called using GATK*” for all trios 
jointly. Putative de novo mutations were identified restricting to sites passing 
standard filters and both parents were homozygous for the reference sequence 
and the offspring was heterozygous, and each genotype call was made confidently 
(see Supplementary Information). 

All putative de novo events were validated by sequencing the carrier and both 
parents using Sanger sequencing methods (71 trios) or by using Sequenom 
MALDI-TOF (104 trios). All events were annotated using RefSeq hg19. 

We modelled a Poisson process consistent with the mutation model and 
observed data. We varied the fraction of genes that influence risk, the probability 
of a functional variant, and the penetrance of said events. 

We performed association tests using SKAT”’, a generalization of C-alpha”®. 
Our primary analyses treat case-control data generated at Baylor and Broad 
sequencing centres separately (23 genes X 2 sites), but we also performed mega- 
and meta-analyses (23 genes X 2 methods). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Phenotype assessment. Affected probands were assessed by research-reliable 
research personnel using Autism Diagnostic Interview-Revised (ADI-R), and 
the Autism Diagnostic Observation Schedule-Generic (ADOS) and DSM-IV 
diagnosis of a pervasive developmental disorder was made by a clinician. All 
probands met criteria for autism on the ADI-R and either autism or ASD on 
the ADOS, except for the three subjects from AGRE that were not assessed with 
the ADOS. In all, 85% of probands were classified with autism on both the ADI-R 
and ADOS. All subjects provided informed consent and the research was approved 
by institutional human subjects boards. 

Exome sequencing, variant identification and de novo detection. Exome 
capture and sequencing was performed at each site using similar methods. 
Exons were captured using the Agilent 38 Mb SureSelect v2 (University of 
Pennsylvania and Broad Institute n = 118), the NimbleGen Seq Cap EZ SR v2 
(Mt Sinai School of Medicine, Vanderbilt University n =51), or NimbleGen 
VCRome 2.1 (Baylor n = 6). After capture, another round of LM-PCR was per- 
formed to increase the quantity of DNA available for sequencing. All libraries were 
sequenced using an I]luminaHiSeq2000. 

Sequence processing and variant calling was performed using a similar compu- 
tational workflow at all sites. Data were processed with Picard (http://picard. 
sourceforge.net/), which uses base quality-score recalibration and local realign- 
ment at known indels* and BWA’ for mapping reads to hg19. SNPs were called 
using GATK*” for all trios jointly. The variable sites that we have considered in 
analysis are restricted to those that pass GATK standard filters. From this set of 
variants, we identified putative de novo mutations as sites where both parents were 
homozygous for the reference sequence and the offspring was heterozygous and 
each genotype call was made confidently (see Supplementary Information). 
Validation of de novo events. Putative de novo events were validated by 
sequencing the carrier and both parents using Sanger sequencing methods 
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(University of Pennsylvania, Mt Sinai School of Medicine, Vanderbilt 
University, Baylor Medical College) or by Sequenom MALDI-TOF genotyping 
of trios (Broad). 

Gene annotation. All identified mutations were then annotated using RefSeq 
hg19. The functional impact of variants was assessed for all isoforms of each gene, 
with the most severe annotation taking priority. Splice site variants were identified 
as occurring within two base pairs of any intron/exon boundary. 

Expectation of de novo mutation calculation. To calculate the expected de novo 
rate, we assessed the mutability of all possible trinucleotide contexts in the inter- 
genic region of the human genome for variation in two fashions: fixed genomic 
differences compared to chimpanzee and baboon” and variation identified from 
the 1,000 Genomes project. The overall mutation rate for the exome was then 
determined by summing the probability of mutation for all bases in the exome that 
were captured successfully. We also determined the probability of each class 
functional mutation by summing the annotated variants. 

Pathway analyses. We applied DAPPLE”*, which uses the InWeb database’®, to 
determine whether there is excess protein-protein interaction across the genes hit 
by a functional de novo event. We also assessed whether these genes were more 
closely connected to a list of ASD genes’. 

Modelling de novo events. We modelled a Poisson process consistent with the 
expected distribution defined by the mutation model and with the observed data. 
We varied the fraction of genes that influence risk, the probability a variant in a 
gene would be functional, and the penetrance of functional de novo events. We also 
simulated a random set of de novo events to estimate the probability of hitting a 
gene multiple times. 

Association analysis. We performed association tests using SKAT", a general- 
ization of C-alpha”®. Our primary analyses treat case-control data generated at 
Baylor and Broad sequencing centres separately (23 genes X 2 sites), but we also 
performed mega- and meta-analyses (23 genes X 2 methods). 
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Decoding post-transcriptional regulatory programs in RNA is a 
critical step towards the larger goal of developing predictive 
dynamical models of cellular behaviour. Despite recent efforts’’, 
the vast landscape of RNA regulatory elements remains largely 
uncharacterized. A long-standing obstacle is the contribution of 
local RNA secondary structure to the definition of interaction 
partners in a variety of regulatory contexts, including—but not 
limited to—transcript stability’, alternative splicing* and local- 
ization’. There are many documented instances where the presence 
of a structural regulatory element dictates alternative splicing 
patterns (for example, human cardiac troponin T) or affects other 
aspects of RNA biology’. Thus, a full characterization of post- 
transcriptional regulatory programs requires capturing informa- 
tion provided by both local secondary structures and the 
underlying sequence**®. Here we present a computational frame- 
work based on context-free grammars*” and mutual information” 
that systematically explores the immense space of small structural 
elements and reveals motifs that are significantly informative of 
genome-wide measurements of RNA behaviour. By applying this 
framework to genome-wide human mRNA stability data, we reveal 
eight highly significant elements with substantial structural 
information, for the strongest of which we show a major role in 
global mRNA regulation. Through biochemistry, mass spectro- 
metry and in vivo binding studies, we identified human HNRPA2B1 
(heterogeneous nuclear ribonucleoprotein A2/B1, also known as 
HNRNPAZ2B1) as the key regulator that binds this element and 
stabilizes a large number of its target genes. We created a global 
post-transcriptional regulatory map based on the identity of the 
discovered linear and structural cis-regulatory elements, their 
regulatory interactions and their target pathways. This approach 
could also be used to reveal the structural elements that modulate 
other aspects of RNA behaviour. 

To isolate stability from other aspects of mRNA behaviour, we 
performed whole-genome mRNA stability measurements by incub- 
ating human MDA-MB-231 breast cancer cells in the presence of 
4-thiouridine, which is efficiently incorporated into cellular RNA. 
Subsequently, 4-thiouridine-labelled transcripts were captured and 
quantified at different time-points after the removal of 4-thiouridine 
from the growth medium. We calculated a relative decay rate for each 
transcript based on the rate at which 4-thiouridine-labelled transcripts, 
in the absence of 4-thiouridine in the media, are replaced by newly 
synthesized unlabelled mRNAs in the population (Supplementary Fig. 1). 
These measurements were then used to identify the putative cis- 
regulatory elements (linear and structural) that underlie transcript 
stability. A number of methods have been previously introduced for 
discovering structural motifs mainly based on free energy minimiza- 
tion, local sequence alignments or a combination of both alignments 


and secondary structure predictions***. However, the extent to which 


these in silico predictions reflect stable in vivo molecular conforma- 
tions has not been fully explored’. In fact, the RNA binding proteins 
and complexes that interact with their target transcripts may facilitate 
the formation of secondary structures in vivo. Thus, we sought to 
bypass the need for predicting thermodynamically stable secondary 
structures by efficiently enumerating a large space of potential struc- 
tural motifs. We developed TEISER (Tool for Eliciting Informative 
Structural Elements in RNA), a framework for identifying the struc- 
tural motifs that are informative of whole-genome measurements 
across all the transcripts. In this approach, structural motifs are defined 
in terms of context-free grammars’ (CFGs) that represent hairpin 
structures as well as primary sequence information (see Methods 
and Supplementary Fig. 2). TEISER employs mutual information to 
measure the regulatory consequences of the presence or absence of 
each of roughly 100 million different seed CFGs (see Methods). 
Mutual information is a robust non-parametric measure that reveals 
general dependencies across discrete or continuous measurements”. 
For example, when applied to the transcript stability data, TEISER 
captures the dependency between the stability of each mRNA and 
the presence or absence of a given structural motif in its 5’ and 3’ 
untranslated regions (UTRs). TEISER, subsequently, uses these mea- 
surements to choose and further refine the most informative motifs, 
and performs a series of statistical tests—for example, randomization- 
based statistics and jackknifing tests—to achieve very low (<0.01) 
false-discovery rates (see Methods and Supplementary Fig. 2). 

Application of TEISER to the mRNA stability measurements in 
MDA-MB-231 cells revealed eight strong structural motif predictions 
that passed our statistical tests aimed at finding the most likely ele- 
ments causally involved in mRNA stability (Fig. 1 and Supplementary 
Fig. 3). Apart from being highly informative of mRNA stability mea- 
surements, these putative regulatory elements show a variety of other 
characteristics that support their functionality. For example, four of 
the discovered motifs are also informative of transcript stability mea- 
surements in mouse"’ (Supplementary Fig. 4a). Furthermore, these 
motifs are highly conserved between human and mouse genomes 
(see Methods and Supplementary Fig. 3) and are also informative of 
co-expression clusters discovered across independent whole-genome 
data sets (Supplementary Fig. 4b). 

Among the putative structural motifs discovered by TEISER, we 
chose sRSM1] (structural RNA stability motif 1)—the most statistically 
significant 3’ UTR element (z-score = 122)—for further analysis. In 
order to probe the functionality of sRSM1 instances across the genome, 
we performed in vivo titration experiments using synthetic oligonu- 
cleotides’®’*. Upon transfecting MDA-MB-231 cells with decoy RNA 
molecules harbouring sRSM1 instances (Supplementary Fig. 5), we 
observed a notable reduction in the level of endogenous transcripts 
that carried this motif, in comparison to their level in the control 
cells transfected with scrambled RNA molecules (Fig. 2). This global 
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Figure 1 | Discovery of RNA structural motifs informative of genome-wide 
transcript stability. Each RNA structural motif is shown (far right) along with 
its pattern of enrichment/depletion across the range of mRNA stability 
measurements throughout the genome (far left). The panel labelled mRNA 
stability measurements shows how the transcripts are partitioned into equally 
populated bins based on their stability measures, going from left (highly stable) 
to right (unstable). In the heatmap representation, a gold entry marks the 
enrichment of the given motif in its corresponding stability bin (measured by 
log-transformed hypergeometric P-values), while a light-blue entry indicates 
motif depletion in the bin. Red and blue borders mark highly significant motif 
enrichments and depletions, respectively. From left to right, we show the motif 


downregulation points to the presence of a trans-acting factor that, 
upon interaction with sRSM1, stabilizes its target transcripts. The 
decoy (synthetic) sRSM1 elements compete with endogenous 
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Figure 2 | The regulatory role of SRSM1. Whole-genome expression levels 
were measured in decoy-transfected samples relative to the controls transfected 
with scrambled RNA molecules (see Methods). The measurements were 
performed in duplicate, for two independent decoy/scrambled sets (the relative 
transcript levels were subsequently averaged across the two replicates in each 
set). Genes were sorted and quantized into equally populated bins based on the 
average log-ratio of their expression levels in the decoy samples relative to the 
scrambled controls. TEISER was used to show the enrichment/depletion 
patterns of transcripts harbouring sRSM1 in their 3’ UTRs. From left to right, 
we also show motif name, sequence, MI values and the associated z scores. 
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names, their location (UP for 5’ UTR and DN for 3’ UTR), their sequence 
information (‘motif, in the form of an alphanumeric plot), their associated 
mutual information values (MI; see below), their frequency (the fraction of 
transcripts that carry at least one instance of the motif), and their z score (see 
below). Each MI value is used to calculate a z score, which is the number of 
standard deviations of the actual MI relative to MIs calculated for 1.5 million 
randomly shuffled stability profiles. A structural illustration of each motif is 
also presented (far right) using the following single letter nucleotide code: 
Y = [UC], R = [AG], K = [UG], M = [AC], S = [GC], W = [AU], B = [GUC], 
= [GAU], H = [ACU], V = [GCA] and N = any nucleotide. 


mRNAs for the putative trans-acting factor, which results in the 
observed reduction in the level of its target mRNAs. Furthermore, 
reporter constructs carrying instances of sRSM1 showed a marked 
decrease in transcript decay rate in comparison to scrambled controls, 
further suggesting a direct role for this structural element in transcript 
stability (Supplementary Fig. 6). 

We used streptomycin-binding RNA aptamer immobilization 
coupled with mass spectrometry”* to discover candidates that bind, 
in vitro, to the decoy instances of sRSM1, but not to the scrambled 
versions (Supplementary Fig. 7). After isolation under stringent con- 
ditions and in-solution digestion of RNA-bound proteins followed by 
nanoliquid chromatography-tandem mass spectrometry, we identified 
HNRPA2B1 as a promising candidate (Supplementary Table 1). This 
RNA-binding protein is a member of the A/B subfamily of heterogen- 
eous nuclear ribonucleoproteins (hnRNPs)"* and carries two repeats of 
quasi-RNA-recognition motif (qRRM) RNA binding domains 
(Supplementary Fig. 8). Moreover, the established roles of other 
members of this family, namely HNRNPD and HNRNAI, in regulat- 
ing RNA stability’* and binding terminal stem-loops’® further suggest 
HNRPA2B1 as a functional regulator. Also, more than 4,000 
transcripts carry potentially functional instances of sRSM1 (see 
Methods), implicating this motif as a major global regulator of 
mRNA stability. The HNRPA2B1 transcript, at the same time, is highly 
abundant in the cell (one standard deviation higher than average’’), 
thus making it a promising candidate for global modulation of mRNA 
stability through sRSM1. 
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In order to directly assess the regulatory consequences of modulat- 
ing HNRPA2BI1, we performed knock-down experiments followed by 
gene expression profiling. Consistent with our prior observations, 
HNRPA2B1 knock-down caused a significant decrease in the expres- 
sion level of transcripts carrying sRSM1 (Fig. 3a). Stability measure- 
ments in the knock-down cells confirmed that the observed 
downregulation of these transcripts was in fact due to changes in 
stability (see Methods), with the transcripts carrying sRSM1 elements 
showing a marked increase in their corresponding relative decay rates 
(Fig. 3b). 

In principle, our observations are consistent with a possible indirect 
role for HNRPA2B1—brought about, for instance, by a common partner 
that binds both HNRPA2B1 and sRSM1 sites. The direct interaction 
between HNRPA2B1 and its potential target genes can be tested 
through cross-linking and immunoprecipitation of HNRPA2B1, 


LETTER 


which, through local ultraviolet photoreactivity of bases and amino 
acids, can detect direct physical interactions’*. We expressed a tagged 
clone of HNRPA2B1 in MDA-MB-231 cells, and after ultraviolet- 
crosslinking, immunoprecipitated this protein and the target mRNA 
molecules that were bound to it. We then labelled the isolated RNA 
population and hybridized it to microarrays with the input total RNA 
as control (a method called RIP-chip”’). We observed a highly signifi- 
cant enrichment of sRSM1 in the immunoprecipitated population 
(Fig. 3c). In order to reduce the background and better pinpoint the 
HNRPA2B1 binding sites, we treated the samples with nuclease before 
immunoprecipitation under denaturing conditions and sequenced the 
HNRPA2B1-bound RNA population (HITS-CLIP”’). We observed that 
sRSM1 elements were significantly enriched in the identified putative 
binding sites, in comparison with randomly selected sequences”! 
(Fig. 3d). These observations demonstrate that HNRPA2B1 directly 
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Figure 3 | HNRPA2BI stabilizes transcripts through direct in vivo binding 
to sRSM1 structural motifs. a, Genome-wide expression levels were measured 
in HNRPA2B1 siRNA-transfected samples relative to mock-transfected 
controls. TEISER was used to capture the enrichment/depletion pattern of 
transcripts carrying sRSM1 across the relative expression values. Experiments 
were performed in triplicate, each with an independent siRNA targeting 
HNRPA2B1 and the resulting log ratios were averaged for each transcript. 

b, Transcript decay rates were compared in HNRPA2B1 knock-downs versus 
mock-transfected controls. These measurements were then analysed by 
TEISER to visualize the extent to which the decay rates of transcripts carrying 
sRSM1 elements were increased following HNRPA2B1 knock-down. c, Using 
ultraviolet-crosslinking followed by immunoprecipitation, mRNAs that bind 
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HNRPA2B1 were extracted and compared against the input mRNA population 
(RIP-chip). The log ratio calculated for each mRNA denotes its abundance in 
the immunoprecipitated sample relative to the input control. Bins to the right 
contain the mRNAs that were captured as interacting partners with 
HNRPA2B1. Similar to the prior examples, TEISER was used to show the 
enrichment/depletion pattern of transcripts carrying sRSM1 in their 3’ UTRs. 
The values associated with each transcript were calculated as the average of log 
ratios from biological replicates. d, HNRPA2B1 binding sites were identified 
using immunoprecipitation followed by high-throughput sequencing (HITS- 
CLIP). Instances of the sRSM1 element are significantly enriched in these sites 
relative to a population of random sequences from 3’ UTRs that are not 
represented in the sequenced population. 
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interacts with sRSM1 in vivo and acts to stabilize its target transcripts 
through this regulatory element. These transcripts, in turn, modulate a 
variety of cellular processes and pathways. For example, we observed a 
significant positive correlation between sRSM1 target transcripts and 
doubling-time in NCI-60 breast cancer cell lines (Fig. 4a). Indeed, 
knocking-down HNRPA2B1 resulted in a slight but significant 
increase in growth rate (by 10%, P-value <10° *), further highlighting 
the regulatory role of this global modulator in a key cellular process 
(Fig. 4b). 

Revealing the detailed post-transcriptional regulatory code relies on 
the discovery of all the cis-regulatory elements that contribute to 
changes in transcript abundance. In addition to the sRSMs identified 
through TEISER, we also discovered a large diverse set of IRSMs (linear 
RNA stability motifs), including six known microRNA recognition 
sites, that are informative of transcript stability measurements 
(Supplementary Fig. 9). These motifs were identified by FIRE* 
(Finding Informative Regulatory Elements), a framework for discover- 
ing informative linear motifs. Combining these two approaches pro- 
vided us with an extensive set of putative regulatory elements that 
cover both structural and primary sequence components. The next 
step in deciphering the post-transcriptional regulatory program 
involves the identification of target pathways that are potentially 
modulated by each element. Using iPAGE' (Pathway Analysis of 
Gene Expression), we showed that our discovered elements probably 
target a diverse array of cellular processes and pathways (Supplemen- 
tary Fig. 10). For example, the sRSM1 structural element is signifi- 
cantly enriched in the 3’ UTRs of the genes involved in “Notch 
signalling’, while avoiding the UTRs of other pathways such as ‘nucleo- 
some assembly’ (Supplementary Fig. 11). These results demonstrate 
that while post-transcriptional regulatory mechanisms are poorly 
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Figure 4| HNRPA2B1 regulates growth rate. a, Whole genome expression 
levels across five breast cancer cell lines (MCF7, MDA-MB-231, HS578T, BT- 
549 and T47D) were correlated against their doubling times'’. The resulting 
values, ranging from —1 to 1, were analysed by TEISER to probe the 
enrichment/depletion pattern of transcripts carrying sRSM1. b, The growth of 
HNRPA2B1 siRNA-transfected samples was compared to those of mock- 
transfected controls. For each time-point, the number of cells in four 
independent samples was counted in duplicates (n = 8), yielding an estimated 
growth-rate (~). Shown are the average log-ratios, their standard deviation at 
each time-point, and the statistical significance of the observed difference in 
growth-rate. 


4 | NATURE | VOL 000 | 00 MONTH 2012 


characterized, they have potentially far-reaching impact on specific 
cellular processes. 

Regulatory programs often employ combinatorial interactions 
between various cis-regulatory elements to modulate gene expres- 
sion*’*, We used mutual information to reveal such potential inter- 
actions in the post-transcriptional regulatory programs governing 
mRNA stability (Supplementary Figs 12 and 13). For example, 
sRSM1 showed significant interactions with a number of structural 
and linear motifs, including sRSM8 and sRSM3 (Supplementary Fig. 11). 
These observed interactions might reflect cross-talk, or insulation, 
between the underlying regulatory processes that act upstream of these 
elements. The full map of such interactions (Supplementary Figs 14 
and 15) reveals a complex network of motif-pathway relationships that 
set the stage for molecular dissection and predictive modelling of post- 
transcriptional regulation from sequence. 

Whereas we have studied mRNA stability under normal and static 
conditions in a single cell line, the full regulatory program that governs 
mRNA stability is likely to involve a much richer repertoire of cis- 
regulatory elements operating within a more complex regulatory net- 
work. Also, although we have focused on transcript stability, our 
framework is general in concept and can be employed to study 
regulatory programs governing other aspects of RNA biology. For 
example, the established role of local secondary structures in shaping 
the splicing code*” suggests alternative splicing as a prominent area 
for analysis using this framework. The large repertoire of publicly 
available whole-genome expression data sets similarly offers a rich 
resource for identifying the post-transcriptional regulatory modules 
that underlie steady-state measurements. 


METHODS SUMMARY 


TEISER relies on calculating mutual information (MI) values between whole- 
genome measurements and millions of predefined structural motifs. The statist- 
ically significant motifs are then optimized and elongated through a heuristic 
search algorithm. The mRNA stability measurements were performed using a 
previously published method’. The decoy/scrambled experiments and siRNA 
knock-downs were performed using lipofectamin 2000 reagent (Invitrogen). For 
hybridizations, we used human 4 X 44k whole-genome human arrays (Agilent). 
Isolation and identification of RNA-binding proteins were based on previously 
published protocols'***. HNRPA2B1 target transcripts were isolated based on the 
CLIP protocol’®. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


TEISER: detailed description of the algorithm. Genome profile. A genome 
profile is defined across the genes in the genome, where each gene is associated 
with a unique measurement. Whole-genome measurements, discrete or continu- 
ous, can be obtained from a variety of experimental or computational sources (for 
example, Supplementary Fig. 1). 

Structural motif definition. Each structural motif is defined as a series of context- 
free statements that define the structure and sequence of the motif (Supplementary 
Fig. 2). A context-free grammar is a set of production rules that describes how 
phrases are made from their building blocks. Considering a structured RNA 
molecule as a phrase, its potential building blocks are the different base pairs 
and bulges. Loops can be considered as bulges that happen at the beginning of 
phrases. Also, internal loops can be considered as combination of left and right 
bulges in the middle of phrases. The context-free grammar that we have used 
contains the following production rules: S-+S[AUCGN], S-[AUCGN]S, 
S—[AUCGN]S[AUCGN]; wherein the first production rule depicts a right bulge, 
the second production rule results in a left bulge, and the third production rule 
creates a base-pairing. For example, consider the stem loop AAACGCUUU (the 
stem region is underlined). Let the symbol S be a non-terminal symbol that stands 
for this stem loop; the production rule SSG adds a G to the 3’ end of the 
molecule, creating a new S, AAACGCUUUG, which has an unpaired 3’-end G. 
Next, using the production rule S->GSC, we can add a G to the 5’ end anda C to 
the 3’ end of the molecule and make them pair with each other, again creating a 
new S, GAAACGCUUUGC, which can be further expanded in this way. Note that 
the G that we added in the previous step has now become a right bulge. 

Motif profile. For every given motif, we create a binary vector across all the genes, 
in which ‘1’ denotes the presence and ‘0’ denotes the absence of that motif. This 
vector is called a motif profile. 

Creating seed CFGs. We used, as the seed motifs, an exhaustive set of context-free 
statements that represented all possible stem-loop structures that satisfied the 
following criteria: stem length of at least 4 bp and at most 7 bp; loop length of at 
least 4 nt and at most 9 nt; at least 4 and at most 6 production rules representing 
non-degenerate bases (that is, production rules that are not SSN, SNS, or 
S—NSN); and information content of at least 14bits and at most 20 bits. The 
information content of the motif M, which is represented by n production rules, 
was defined as —log>(py.), wherein p,y is the probability that a random sequence of 
length / matches the n production rules of motif M, with / being equal to 2 X n, + n2 
in which 1, is the number of production rules that represent base pairings and 1, is 
the number of production rules that represent bulges (n, + m2 =n). 

Quantizing continuous genome profiles. Mutual information is defined for both 
continuous and discrete random variables; however, in practice, continuous data 
are discretized before calculating the mutual information (MI) values. Our quant- 
ization procedure involves using equally populated ‘bins’. Thus, the discretization 
step only requires a single parameter, that is, the number of genes in each bin. In 
TEISER, we have set the default number of bins to 30 (N, = 30). It should be noted 
that the results are not sensitive to variations in the value of N. as long as N. is > 10 
and each bin has more than ~100 associated transcripts. 

Removing recently duplicated genes. Recently duplicated members of gene 
families or transposons often share a significant amount of sequence identity in 
their UTRs. They also tend to cross-hybridize on the arrays and show a high 
artificial correlation. This would in turn bias our search towards conserved ele- 
ments in the UTRs of these genes. In TEISER, similar to FIRE’, we remove the 
duplicates that have similar values (for example, fall in the same bin after quant- 
ization of the input genome profile). A MegaBlast E-value cutoff of 1 x 10” '° was 
used to identify duplicates. 

Calculating the mutual information values. We performed mutual information (MI) 
calculations between the genome profile and the motif profiles using algorithms 
introduced and described elsewhere*””. These algorithms take the necessary steps to 
ensure reliable MI calculations (for example, minimum sample sizes for reliable 
estimation of joint distributions). 

Randomization-based statistical testing. To assess the statistical significance of the 
calculated MI values, TEISER uses a non-parametric randomization-based stat- 
istical test. In this test, the genome profile is shuffled 1,500,000 times and the 
corresponding MI values are calculated. A motif is deemed significant only if 
the real MI value is greater than all of the randomly generated ones. In TEISER, 
in order to minimize the required number of tests, structural motifs are first sorted 
based on the MI values (from high to low) and the statistical test is applied in order. 
When 20 contiguous motifs in the sorted list do not pass the test, the procedure is 
terminated. 

Optimization of the identified seeds into more informative motifs. Our initial 
collection of structural motifs, despite being large, is a coarse-grained sampling 
of the entire space. Mainly, it provides us with a set of informative seeds that 
should be later optimized into closer representations of their actual form’. 


Accordingly, all the structural motifs that pass the previous stage are further 
optimized and elongated. The process involves: (1) optimization: randomly 
choose one of the context-free statements (production rules) from the motif 
and convert its sequence information to all possible combinations of nucleotides. 
Evaluate all the resulting structural motifs and accept the one that results in the 
highest MI value. (2) Elongation: production rules are added to the end of the 
context-free phrase that represents the motif, thus extending its effective length in 
the form ofa base pair or a bulge. The increase in length is similarly accepted only if 
it results in a higher MI value. 

Removing redundantly informative structural motifs. Motifs that redundantly 
represent the same potential cis-regulatory elements are identified and removed 
using the concept of conditional information as described before”"®. 

Finding robust motifs. TEISER also performs jack-knife resampling to find robust 
motifs that are not over-sensitive to the composition of the input data. For each 
predicted motif, we perform 10 jackknifing trials where, in each trial, one third of 
the genes are randomly removed and the mutual information value and its stat- 
istical significance is evaluated. The robustness score is then defined as the number 
of trials in which the motif remains significant (scores better in the original 
genome profile than in all the randomly shuffled genome profiles) after resam- 
pling, ranging from 0/10 to 10/10. By default, TEISER requires the motif to be 
significant in more than half of the trials (a robustness score equal to or greater 
than 6/10). While this parameter can be changed at the user’s discretion, our 
experience with both TEISER and FIRE’ suggests that this threshold results in 
very low false discovery rates across a variety of data sets (discrete and continuous). 
Patterns of motif enrichment and depletion. For a given motif, a high mutual 
information value results from the non-random distribution of its targets across 
the input range. This results in significant patterns of enrichment and depletions 
across the genome profile, which can be quantified by calculating enrichment/ 
depletion scores. These scores result from the log transformation of P-values 
calculated based on the hypergeometric distribution, as described previously’. 
Final statistical tests. In case the genome profile is continuous, one can require 
TEISER to return motifs that are enriched at one end of the data range or the other 
(for example, structural motifs in Fig. 1). TEISER accomplishes this through 
calculating the Spearman correlation between the enrichment scores and the 
average data value across all the bins. For the structural motifs in Fig. 1, the 
P-value threshold for these Spearman correlations was set to 0.001 (for 
Supplementary Fig. 3, this value is 0.01 which puts the FDR at 10%). It should 
be noted, however, that other statistical tests could be used in this step at the 
discretion of the user. The goal, ultimately, is to identify the motifs that show 
significant enrichments at either end of the data range. 

Inter-species conservation. For each motif, we also calculate a conservation score 
based on its network-level conservation with respect to a related genome’. For this, 
orthologous transcripts in both genomes are scanned for the presence/absence of 
the motif. The overlap of positive sequences between the orthologous sequences is 
used to calculate a hypergeometric P-value’. The conservation score is then 
defined as 1 —P, which ranges between 0 and 1 (1 being highly conserved between 
the two genomes). In this study, we have used the human and mouse genomes to 
calculate the conservation scores associated with each structural motif. 

Finding potentially active instances of each motif. As described previously’, we 
defined the target genes of a predicted motif as all transcripts whose 3’ or 5’ UTRs 
contain the motif and are associated with a category/bin where the motif is 
enriched. In other words, these are the transcripts whose UTRs contain potentially 
‘active’ motif occurrences. Upon identifying these likely targets for each structural 
motif, a weight-matrix can be generated from these potentially functional 
instances as a post-processing step (Supplementary Table 2). 

False-discovery rate. In order to assess the false discovery rate, we ran 30 trials with 
shuffled 5’ and 3’ UTR sequences. In all the trials, not a single motif passed all the 
statistical tests. Thus, in case of the stability data set, the number of false positives 
in each trial, on average, is smaller than 1/30 ~ 0.34, which corresponds to an FDR 
of <0.01. 

Predicting functional interactions. Given two motifs, structural or linear, one can 
assess their putative functional interaction through measuring how informative 
the presence of one would be about the presence or absence of the other. For 
revealing these interactions, we again use mutual information values calculated for 
pairwise motif profiles of structural and linear motifs. Randomization-based stat- 
istical tests are then used to find the significant interactions. For this, one of the 
motif profiles is shuffled 10,000 times and the interaction is deemed significant 
only if the real mutual information value is higher than all the 10,000 random ones. 
Predicting the target pathways. iPAGE”, with default settings, was used to identify 
the likely pathways that are regulated by the discovered structural and linear 
motifs. 

Availability. TEISER is available online for download at https://tavazoielab.c2b2. 
columbia.edu/TEISER. 
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Measuring mRNA stability. RNA stability measurements were performed based 
on a previously published protocol’. In short, MDA-MB-231 cells at 70% con- 
fluency were incubated in the presence of 25 11M 4-thiouridine (Sigma) for 4h. 
Then the cells were washed with fresh media (DMEM + 10% FBS) and incubated 
for 0, 1,2 and 4h. At each time point, cells were washed with cold PBS and RNA 
extraction was performed using a total RNA purification kit (Norgen Biotek). The 
4-thiouridine thiol groups were then biotinylated using EZ-Link Biotin- HPDP 
(Pierce). We subsequently used ,1Macs magnetic columns (Miltenyi Biotec) to 
capture the labelled RNAs. The resulting samples were then processed for one- 
colour hybridization using a one-colour low-input quick-amp labelling kit 
(Agilent) and hybridized according to the manufacture’s instructions. A one- 
colour RNA spike-in kit (Agilent) was used as endogenous control to normalize 
values between arrays. For each transcript, the drop in signal as a function of time 
was used as a measure of mRNA stability (Supplementary Fig. 1): 


r=—lIn - t, where S, denotes signal at time f. Linear regression was used 


0 
to calculate r for each transcript based on the hybridization signals from the four 


time points. It should be noted that TEISER is a non-parametric approach, thus it 
is the ranking rather than the actual stability values that underlies our motif 
discovery. 

Transfection of decoy and scrambled oligonucleotides. We chose real instances 
ofthe sRSM1 structural motifs from NM_014363, which contains four instances of 
sRSM1, to create two decoy sets of sequences, each containing two of these 
instances (underlined) along with part of the real sequences as context. Set 1: 
AAAACTATTTTGAAGATGGTGGTGAGCTGCAAAATAGCTGGATGGATT 
TGAATGATTGGGATGATACATCATTGAACTGCACTTTATATAACCAAA 
GCTTAGCAGTTTGTTAGATAAGAGTCTATGTATGTCTCTGGTTAGGATG 
AAGTTAATTTTATGTTTTTAACATGGTATTTTTGAAGGAGCTAATGAAA 
CACTGG. Set2: ATTGTTTCTGGAAACTGCTTGCCAAGACAACATTTATTA 
ACTGTTAGAACACTTGCTTTATGTTTGTGTGTACATATTTTCCACAAAT 
GTTATAATTTATATAGTGTGGTTGAACAGGATGCAATCTTTTGTTGTCT 
AAAGGTGCTGCAGTTAAAAAAAAAACAACCTTTTCTTTCAATATGGCAT 
GTAGTGGAGTTTTT. For the scrambled controls, we used the shuffled version of 
the putative binding sites (see Supplementary Fig. 5). These two decoy/scrambled 
sets were then chemically synthesized (IDT). An upstream T7 promoter was used to 
transcribe the constructs in vitro using Megascript T7 kit (Ambion). In order to 
reduce cytotoxicity, RNA molecules were capped and poly-A tailed using Cap 
Analogue (Ambion) and poly-A polymerase (NEB). MDA-MB-231 cells at 80% 
confluency were transfected with the resulting RNA oligos using Lipofectamin 2000 
reagent (Invitrogen) according to manufacturer’s recommendations. Experiments 
were performed in duplicates for each set. Forty-eight hours post-transfection, we 
extracted RNA and differentially labelled the samples with Cy3 or Cy5 dyes. The 
samples were then hybridized on Agilent human gene expression arrays (4 X 44k). 
The Cy3/Cy5 ratios from the two biological replicates were then averaged into a 
single data set as log of ratios, which was then analysed by TEISER. 

Reporter system for testing the functionality of sRSM1 instances. The plasmid 
pcDNAS5/FRT/TOPO (Invitogen) was used to clone a GFP-coding sequence along 
with a gateway cloning site downstream of GFP (in its 3’ UTR). Decoy and 
scrambled sequences (Set 1 in the previous section) were subsequently cloned into 
the resulting construct using the gateway site. The resulting plasmids were trans- 
fected into the Flp-In 293 cell line (Invitrogen), and the cells were grown in 
Hygromycin for selecting stably transfected cells. The resulting cell lines, named 
Flp-In 293 GFP-Decoy and Flp-In 293 GFP-Shuffled, were subjected to FACS 
measurements to quantify GFP expression. For the decay rate measurements, cells 
were incubated in media with 5 rg ml’ of o.-amanitine (Sigma). Time points were 
taken at 0, 1.5, 3 and 6h in duplicates for Flp-In 293 GFP-Decoy and Flp-In 293 
GFP-Shuffled cells. Quantitative PCR (Fast SYBR Green Master Mix, Ambion) 
was then used to determine the relative quantity of GFP transcript in each cell line 
at different time-points using 18S rRNA as endogenous control. 

Identifying binding candidates of sRSM1. We used a published protocol’ to 
isolate potential RNA-binding proteins that bind sRSM1. In short, the StreptoTag 
aptamer was added downstream of the Set 1 decoy and scrambled sequences. The 
resulting RNAs were then immobilized on a dihydrostreptomycin Sepharose col- 
umn (GE Healthcare) and were used to immunoprecipitate potential partners. 
Total protein was extracted from MDA-MB-231 cells (Total Protein Extraction 
Kit, Millipore), 1,000 ig of which was used as input to each column. Samples were 
then washed, eluted in 10 4M streptomycin and subjected to in-solution diges- 
tion***’. Tryptic peptides were then analysed by nanoliquid chromatography- 
tandem mass spectrometry using an Ultimate 3000 nRSLC (Dionex) coupled 
online to an LTQ-Orbitrap Velos mass spectrometer (Thermo Scientific), as previ- 
ously described**. 
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HNRPA2B1 knock-down. The ON-Targetplus (Dharmacon) set of siRNAs for 
HNRPA2BI1 (target sequences: GAGGAGGAUCUGAUGGAUA, GGAGAGUA 
GUUGAGCCAAA, and GCUGUUUGUUGGCGGAAUV) were used to trans- 
fect MDA-MB-231 cells (grown in D10F medium) using Lipofectamine 2000 
(Invitrogen). Three of the four tested siRNAs resulted in a substantial knock-down 
in HNRPA2B1 (more than twofold reduction in expression, log ratio >0.4 and 
P<10 ”) and their corresponding samples were used for hybridization. Forty- 
eight hours post-transfection, we extracted total RNA from each sample along 
with mock-transfected controls. We then differentially labelled the RNA samples 
with Cy3 and Cy5 dyes and hybridized them to Agilent human gene expression 
arrays (4 X 44k). The log of signal ratios was used as a measure of differential 
expression between the samples and controls. These values were averaged across 
the three samples and were subsequently analysed by TEISER to assess the enrich- 
ment/depletion pattern of sRSM1 across the distribution. 

For the decay rate measurements, forty-eight hours post-transfection, cells were 
incubated in media with 5g ml of o-amanitine (Sigma). Time points were 
taken at 0, 1, 2 and 4h in duplicates for the siRNA-transfected samples and 
mock-transfected controls. Each sample was then Cy3-labelled and hybridized 
to expression arrays (Agilent 4 x 44k) in duplicates and the reported signals were 
used to calculate decay rates. Following this procedure, for each transcript, four 
decay rates (two biological replicates, each having two technical replicates) were 
calculated from the siRNA-transfected samples and four decay rates from the 
controls. For each transcript, we then calculated a value according to s(1 — P), 
where P is the t-test P-value between the two sets and s denotes whether the decay 
rates are higher in the siRNA samples (+1) or the mock controls (— 1). After this 
transformation, the data range is between —1 and 1 with the background genes 
(the transcripts that show little change between the two samples) around 0. 
TEISER was then used to visualize the enrichment pattern of sRSM1 across this 
data range. 

Identifying transcripts that interact with HNRPA2B1 (RIP-chip). A myc- 
tagged ORF clone of HNRPA2B1 (variant A2, OriGene) was transfected into 
MDA-MB-231 cells (grown in D10F medium) using Lipofectamine LTX and 
Plus reagent (Invitrogen). Seventy-two hours post-transfection, the cells were 
washed with cold PBS and ultraviolet-irradiated at 4,000 mJ cm”. The cells were 
then collected and lysed with 1 ml M-PER Reagent (Pierce) and 10 pl RNasin 
(NEB). The samples were subjected to DNase treatment (baseline ZERO 
DNase) for 15min at 37°C. Samples were then centrifuged at 16,000g at 4°C 
for 20 min to pellet the cell debris. Immunoprecipitation of tagged HNRPA2B1 
protein was performed using Mammalian c-Myc Tag IP/Co-IP Kit (Pierce) per 
manufacturer’s instructions. Upon elution, samples were subjected to proteinase K 
digestion and polyadenylation. The RNA molecules in each sample were extracted 
using RNeasy MinElute Cleanup Kit (Qiagen) and Cy3-labelled using low-input 
quick-amp labelling kit (Agilent). As control, we used Cy5-labelled RNA samples 
extracted before HNRPA2B1 immunoprecipitation. The samples were hybridized 
to Agilent human gene expression arrays (4 X 44k) and the log of signal ratios was 
used as a measure of transcript affinity to HNRPA2B1. For each transcript, affinity 
values were averaged across two biological replicates and TEISER was used to 
assess the enrichment/depletion pattern of sRSM1. 

Identifying 3’UTR binding sites of HNRPA2B1 (HITS-CLIP). A strategy 
similar to that of target transcript identification was used to discover the 
HNRPA2B1 binding sites. Upon ultraviolet-irradiation of mycHNRPA2B1- 
transfected cells, the samples were subjected to the HITS-CLIP protocol previously 
described elsewhere*’. ChIPSeeger”’, an integrated ChIP-seq analysis platform, 
was used to identify binding sites and extract real and random sequences (default 
parameters) for analysis with TEISER. 

Measuring growth-rates in HNRPA2B1 knock-down cells. HNRPA2B1 
siRNAs (Dharmacon) were used to knock-down the expression of this regulator. 
Seventy-two hours post-transfection, four independent samples were harvested 
and counted in duplicates as the baseline number of cells at time zero. Similarly, 
samples were counted at 25, 49.5, 73.5 and 99.5 h time-points. The same experi- 
ment was performed for mock-transfected cells. Using an exponential growth 
model, the log-ratio of the counted cells at each time-point was used to estimate 
a growth rate for siRNA-transfected and mock-transfected samples. ANCOVA 
was used to determine the P-value associated with the observed differences 
between the two growth rates. 


25. Wisniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample 
preparation method for proteome analysis. Nature Methods 6, 359-362 (2009). 
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microRNA-mRNA interaction maps. Nature 460, 479-486 (2009). 
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Thermal and electrical conductivity of iron at Earth’s 


core conditions 


Monica Pozzo!, Chris Davies”, David Gubbins*? & Dario Alfé>* 


The Earth acts as a gigantic heat engine driven by the decay of 
radiogenic isotopes and slow cooling, which gives rise to plate 
tectonics, volcanoes and mountain building. Another key product 
is the geomagnetic field, generated in the liquid iron core by a 
dynamo running on heat released by cooling and freezing (as the 
solid inner core grows), and on chemical convection (due to light 
elements expelled from the liquid on freezing). The power supplied 
to the geodynamo, measured by the heat flux across the core-man- 
tle boundary (CMB), places constraints on Earth’s evolution’. 
Estimates of CMB heat flux”* depend on properties of iron mixtures 
under the extreme pressure and temperature conditions in the core, 
most critically on the thermal and electrical conductivities. These 
quantities remain poorly known because of inherent experimental 
and theoretical difficulties. Here we use density functional theory 
to compute these conductivities in liquid iron mixtures at core con- 
ditions from first principles—unlike previous estimates, which 
relied on extrapolations. The mixtures of iron, oxygen, sulphur 
and silicon are taken from earlier work’ and fit the seismologically 
determined core density and inner-core boundary density jump”*. 
We find both conductivities to be two to three times higher than 
estimates in current use. The changes are so large that core thermal 
histories and power requirements need to be reassessed. New 
estimates indicate that the adiabatic heat flux is 15 to 16 terawatts 
at the CMB, higher than present estimates of CMB heat flux based on 
mantle convection’; the top of the core must be thermally stratified 
and any convection in the upper core must be driven by chemical 
convection against the adverse thermal buoyancy or lateral varia- 
tions in CMB heat flow. Power for the geodynamo is greatly 
restricted, and future models of mantle evolution will need to 
incorporate a high CMB heat flux and explain the recent formation 
of the inner core. 

First principles calculations of transport properties based on density 
functional theory (DFT) have been used in the past for a number of 
materials (see, for example, refs 9, 10). Recently, increased computer 
power has facilitated simulations of large systems, allowing the problem 
of the size of the simulation cell to be addressed: this can be a serious 
problem for the electrical conductivity, o (ref. 11). Here we report a series 
of calculations of the electrical and thermal conductivity (k) of iron at 
Earth’s core conditions, using DFT. We previously used these methods to 
compute an extensive number of thermodynamic properties of iron and 
iron alloys, including the whole melting curve of iron in the pressure 
range 50-400 GPa (refs 12, 13) and the chemical potentials of oxygen, 
sulphur and silicon in solid and liquid iron at inner core boundary (ICB) 
conditions, which we used to place constraints on core composition’. 
Recently, we computed the conductivity of iron at ambient conditions, 
and obtained values in very good agreement with experiments". 

We calculated three adiabatic temperature-pressure profiles 
(adiabats) for the core; to do this, we assumed three different possible 
temperatures at the ICB, and followed the line of constant entropy as 
the pressure was reduced to that of the CMB. The ICB temperatures 


were: 6,350 K (the melting temperature of pure iron)'*, 5,700 K (the 
melting temperature of a mixture of iron with 10% Si and 8% O, 
corresponding to an inner-core density jump Ap = 0.6gcm *)° and 
5,500 K (the melting temperature of a mixture of iron with 8% Si and 
13% O, corresponding to Ap = 0.8gcm *)*. Then we calculated the 
electrical and thermal conductivity of iron at seven positions on these 
three adiabats. Our results are reported in Fig. 1, and show a smooth 
variation of these parameters in the core; o only varies by ~13% 
between the ICB and the CMB, and it is almost the same for all adiabats. 
A recent shock wave experiment'® reported ¢ = 0.765 X 10°Q7' m7! 
for pure iron at 208 GPa, and an older shock wave measurement’® 
reported ¢ = 1.48 X 10°Q7'm! at 140 GPa. Our values are closer 
to the latter. There is a larger variation in k, as implied by the 
Wiedemann-Franz law (which relates the thermal and electrical 
conductivity through L=k/oT), which we found to be closely 
followed throughout the core with a Lorenz parameter L = (2.48- 
2.5) X 10 *WQK *. The ionic contribution to k was calculated using 
the classical potential used as a reference system in ref. 12, which was 
shown to describe very accurately the energetics of the system and the 
structural and dynamical properties of liquid iron at Earth’s core con- 
ditions. We found that the ionic contribution is only between 2.5 and 
4Wm 'K | on the adiabat, which is negligible compared to the 
electronic contribution, as expected. 

The estimates of k (Fig. 1) are substantially larger than previously 
used in the geophysical literature, approximately doubling the heat 
conducted down the adiabatic gradient in the core and halving the 
power to drive a dynamo generating the same magnetic field. These 
considerations demand a revision of the power requirements for the 
geodynamo. The conductivities for liquid mixtures appropriate to the 
outer core are likely to be smaller than for pure iron, preliminary 
calculations suggesting about 30% lower, a smaller difference than that 
found in previous work’’, but in close agreement with extrapolations 
obtained from recent diamond-anvil-cell experiments, which reported 
a value in the range 90-130 Wm 'K ‘at the top of the outer core’®. 
Our values are also in broad agreement with recently reported DFT 
calculations”. 

We focus on estimates for the two mixtures above, corresponding to 
ICB density jumps 0.6 gem * (ref. 8) and 0.8gcm © (ref. 7). There is 
relatively little effect on the conductivities in the two cases, because any 
additional O in the outer core must be balanced by less S or Si to 
maintain the mass of the whole core, which is well constrained. The 
larger density jump gives a higher O content, more gravitational 
energy, a lower ICB temperature and lower adiabatic gradient: it there- 
fore favours compositional over thermal convection. The relevant 
values are given in Table 1. 

We estimate power requirements for the dynamo using the model 
described in a previous study (ref. 5, and Methods). Neglecting 
small effects, the total CMB heat flux, Qcyp, is the sum of terms 
proportional to either the CMB cooling rate, dT,/dt, or the amount 
of radiogenic heating, h: Qcomp = Q; + Qu + Q, + Q,, where the terms 
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Figure 1 | Electrical and thermal conductivity of iron at Earth’s outer core 
conditions. a-c, Electrical conductivity, o (a), and electronic component of 
thermal conductivity, k (b), of pure iron corresponding to the three outer-core 
adiabatic profiles (adiabats) displayed in c. Black lines, adiabat corresponding 
to the melting temperature of pure iron at ICB pressure; red lines, that of the 
mixture containing 10% Si and 8% O; and blue lines, that of the mixture with 
8% Si and 13% O. Lines are quadratic fits to the first principles raw data 
(symbols). Error bars (2 s.d.) are estimated from the scattering of the data 
obtained from 40 statistical independent configurations. Results are obtained 
with cells including 157 atoms and the single k-point (1/4,1/4,1/4), which are 
sufficient to obtain convergence within less than 1%. 


on the right-hand side represent respectively the effects of secular 
cooling, latent heat, gravitational energy and radiogenic heating. The 
cooling rate, expressed in degrees per billion years, can be varied 
together with the radiogenic heating to produce some desired outcome: 
a fixed mantle heat flux, a marginal dynamo (no entropy left for ohmic 
dissipation, E,,), or a primordial inner core (by decreasing the cooling 
rate and increasing the radiogenic heating). Results for a suite of 11 
models are shown in Table 2. 

Model I fails as a dynamo. There is an entropy deficit, meaning the 
assumption that the whole core can convect is incorrect—the temper- 
ature gradient must fall below the adiabat to balance the entropy 
equation. A dynamo might still be possible with a large part of the 
core completely stratified. Model 2 demonstrates the efficiency of 
compositional convection: the entropy is greatly increased compared 
to model 1 with no change in cooling rate and little increase in heat 
flux; the dynamo is now marginal. Model 3 has an increased cooling 


Table 1 | Parameters used to estimate power requirements for the 
geodynamo 


Ap Tice 


kom aicp (X10) ocma(X10°) O- S/Si 


0.6 5,700 4,186 150(223) 100(144) 1.25(1.56) 1.11(1.36) 8 10 
0.8 5,500 4,039 150(215) 100(140) 1.24(1.57) 1.11(137) 13 8 
Values in parenthesis are for pure iron, other values are approximations for core mixtures. Units are 


gcm’° for the ICB density jump, Ap; K for the temperatures, T; Wm~ + K~? forthe thermal conductivity, 
k, Q-1 m7? for the electrical conductivity, o; % for molar concentrations. 
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Table 2 | Heat flux and entropy for various models of cooling and 
radiogenic heating 


Model Ap dTo/dt h Qaa Qcms IC age Es A 
1 0.6 46 0 15.7 5.8 0.9 =111 1,022 
2 0.8 46 0 15.2 6.1 10 5 826 
3 0.6 5/7 0 15.7 2 0.7 —2 833 
4 0.6 123 0 15.7 15.6 0.3 652 110 
5 0.8 115 0 15.2 15.2 0.4 865 0 
6 0.6 46 3.0 15.7 11.7 0.9 85 659 
z 0.8 46 3.0 15.2 113 1.0 208 468 
8 0.6 12 6.8 15.7 14.7 3.5 =3 1,257 
9 0.6 8.7 6.9 15.7 14.5 4.5 -1 1,472 
10 0.8 12.2 6.3 15.2 13.7 3.5 4 1,000 
11 0.8 9.5 6.6 15.2 14.1 4.5 2 1,128 


Here Ap is the density jump at the ICB in gcm 3; d7o/dt the cooling rate of the CMB in K Gyr}; h the 
radiogenic heat source in pW kg” 2; Qag = —4nk(dTaq/d)) is the heat conducted down the adiabat in TW 
where d7,4/dr is the adiabatic gradient; Qcme is the heat flux across the CMB in TW; E, is the entropy 
available for the dynamo and other diffusive processes in MWK ?. Inner core (IC) age is shown in Gyr; 
stable layer thicknesses, 4, are given in kilometres below the CMB. 


rate and consequent younger inner core to demonstrate what is 
required for a marginal dynamo with Ap =0.6gcm *. Models 4 
and 5 have cooling rates that make the CMB thermally neutral; the 
CMB heat flux is equal to that conducted down the adiabat. Models 6 
and 7 have some radiogenic heating and the original cooling rate and 
operate as dynamos, although they are still thermally stable at the top of 
the core. Models 8-11 have cooling rates that yield old inner-core ages, 
3.5 and 4.5 Gyr, and the radiogenic heating has been adjusted to make a 
marginal dynamo. They are also thermally stable at the top of the core. 

We estimate stable layer thicknesses by computing the radial vari- 
ation of thermal and compositional gradients for each model using the 
equations ofa previous study (ref. 20, Methods), which are derived from 
the equations of core energetics*. To compare thermal and chemical 
gradients, we multiply the latter by the ratio of compositional and 
thermal expansion coefficients «,/a7, thereby converting compositional 
effects into equivalent thermal effects. The base of the stable layer is 
defined as the point where the stabilizing adiabatic gradient, T/, crosses 
the combined destabilizing gradient, T’ = T/ + T/+ T! + T/, where the 
terms represent respectively latent heat, secular cooling, compositional 
buoyancy and radiogenic heating. 

Stable layer thicknesses are hundreds of kilometres in all models 
except those with cooling rates that are so rapid as to make the inner 
core too young; without compositional buoyancy the layers in all 
models except 4 and 5 span half the core (Table 2). Radiogenic heating 
thins the layers for the same cooling rate. Profiles of stabilizing and 
destabilizing gradients (Fig. 2) show that destabilizing gradients are 
greatest at depth, but much reduced compared to previous models” 
because they each depend on a factor 1/k. The thermal conductivity 
increases by 50% across the core, increasing the heat conducted down 
the adiabat at depth and further reducing the power available to drive 
convection near the base of the outer core. Combined thermochemical 
profiles suggest that compositional buoyancy near the top of core is not 
strong enough to drive convection against the adverse temperature 
gradient. 

Stable layers could be thinned or partially disturbed by convection, 
through penetration or instability, or some other effect not included in 
our simple model. A potentially more effective mechanism for inducing 
vertical mixing near the CMB is through lateral variations in CMB heat 
flux, which can drive motions without having to overcome the gravita- 
tional force. The presence of lateral variations makes the relevant heat 
flux for core mixing the maximum at the CMB”, which could be as 
much as 10 times the average”; this does not influence dynamo entropy 
calculations but does allow magnetic flux to be carried to the surface in 
regions of cold mantle, as is observed”. 

As well as raising k, our calculations also raise o to about twice the 
current estimate. Two important quantities depend on o: the magnetic 
diffusion time (the time taken for the slowest decaying dipole mode to 
fall by a factor of e in the absence of a dynamo) and the magnetic 
Reynolds number Rm, which measures the rate of generation of 
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1,221 


2,353 
r (km) 


3,485 


Figure 2 | Stabilizing and destabilizing gradients for three core energetics 
models. Equivalent temperature gradients, T’, plotted against radius for three 
core evolution models. The stabilizing gradient is due to conduction down the 
adiabat, T; (red lines). Compositional buoyancy is denoted by T’ (blue lines), 
latent heat by Tj, secular cooling by T! and radiogenic heating by T/. The total 
destabilizing thermal gradient is represented by the green lines; total 
destabilizing thermochemical gradients are represented by black lines. 


Three models from Table 2 are shown: a, model 2 (Ap = 0.8g cm >, 


dT,/dt = 46 KGyr ! and h = 0); b, model 4 (Ap = 0.6 gcm >, 
dT,/dt = 123 KGyr! and h = 0); c, model 9 (Ap = 0.6gcm °, 
dT,/dt = 8.7KGyr ‘andh=6.9pWkg '). 


magnetic energy by a given flow. The magnetic diffusion time is 
increased to about 50 kyr. This may have significant implications for 
the theory of the secular variation: it makes the frozen flux approxi- 
mation more accurate and lengthens the timescale of all diffusion- 
dominated processes, including polarity reversals. If current estimates 
of Rm are appropriate for the core”, the increased conductivity implies 
that the geodynamo can operate on slower fluid flows and less input 
power from thermal and compositional convection. 

Revised estimates of o and k calculated directly at core conditions 
have fundamental consequences for the thermochemical evolution of 
the deep Earth. New estimates of the power requirements for the 
geodynamo suggest a CMB heat flux in the upper range of what is 
considered reasonable for mantle convection unless very marginal 
dynamo action can be sustained, while a primordial inner core is only 
possible with a significant concentration of radiogenic elements in the 
core. There are objections to a high CMB heat flux and also to radio- 
genic heating in the core**”’, but one of the two seems inevitable if we 
are to have a dynamo. If the inner core is young, these high values of 
conductivity provide further problems with maintaining a purely ther- 
mally driven dynamo. A thermally stratified layer at the top of the core 
also appears inevitable. Viable thermal history models that produce 
thin stable layers and an inner core of age ~1 Gyr are likely to require a 
fairly rapid cooling rate and some radiogenic heating. The presence of 
a stable layer, and the effects associated with an increased electrical 
conductivity, have significant implications for our understanding of 
the geomagnetic secular variation. 


METHODS SUMMARY 


Calculations were performed using DFT with the same technical parameters used 
in refs 6, 12-14. We used the VASP code”’, PAW potentials””° with 4s!3d’ valence 
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configuration, the Perdew-Wang”’ functional, a plane wave cut-off of 293 eV, and 
single particle orbitals were occupied according to Fermi-—Dirac statistics. We 
tested the effect on the conductivity of the inclusion in valence of semi-core 3s 
and 3p states; we found that, as in the zero pressure case", this effect is completely 
negligible. 

The electrical conductivity and the electrical component of the thermal con- 
ductivity have been calculated using the Kubo-Greenwood formula and the 
Chester-Thellung-Kubo-Greenwood formula as implemented in VASP”. 
Because of the low mass of the electrons compared to the ions, the conductivities 
may be calculated by assuming frozen ionic configurations, and averaging over a 
sufficiently large set representing the typical distribution of the ions at the pres- 
sures and temperatures of interest. 

Molecular dynamics simulations were performed in the canonical ensemble 
using cubic simulation cells with 157 atoms and the I point, a time step of 1 fs, 
and an efficient extrapolation of the charge density which speeds up the simula- 
tions by roughly a factor of two (ref. 33). Each state point was simulated for at 
least 6 ps, from which we discarded the first picosecond to allow for equilibration 
and used the last 5 ps to extract 40 configurations separated by 0.125 ps. This 
time interval is roughly two times longer than the correlation time, and therefore 
the configurations are statistically independent from each other. Because of the 
high temperatures involved, the conductivities converge quickly with respect to 
k-point sampling and size of the simulation cell’, and we found that with a 157- 
atom cells and the single k-point (1/4,1/4/,1/4) the results are converged to better 
than 1%. 

The ionic component of the thermal conductivity was calculated using the 
Green-Kubo formula. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

First principles calculations. Calculations were performed using DFT with the 
same technical parameters used in refs 6, 12-14. We used the VASP code**, PAW 
potentials”? with 4s'3d’ valence configuration, the Perdew-Wang” functional, a 
plane wave cut-off of 293 eV, and single particle orbitals were occupied according 
to Fermi-Dirac statistics. We tested the effect on the conductivity of the inclusion 
in valence of semi-core 3s and 3p states; we found that, as in the zero pressure 
case", this effect is completely negligible. 

The electrical conductivity and the electrical component of the thermal con- 
ductivity have been calculated using the Kubo-Greenwood formula and the 
Chester-Thellung-Kubo-Greenwood formula as implemented in VASP”. 
Because of the low mass of the electrons compared to the ions, the conductivities 
may be calculated by assuming frozen ionic configurations, and averaging over a 
sufficiently large set representing the typical distribution of the ions at the pres- 
sures and temperatures of interest. 

Molecular dynamics simulations were performed in the canonical ensemble 
using cubic simulation cells with 157 atoms and the I point, a time step of 1 fs, 
and an efficient extrapolation of the charge density which speeds up the simula- 
tions by roughly a factor of two (ref. 33). Each state point was simulated for at least 
6 ps, from which we discarded the first picosecond to allow for equilibration and 
used the last 5 ps to extract 40 configurations separated by 0.125 ps. This time 
interval is roughly two times longer than the correlation time, and therefore the 
configurations are statistically independent from each other. Because of the high 
temperatures involved, the conductivities converge quickly with respect to k-point 
sampling and size of the simulation cel 14 and we found that with a 157-atom cells 
and the single k-point (1/4,1/4/,1/4) the results are converged to better than 1%. 

The ionic component of the thermal conductivity was calculated using the 

Green-Kubo formula. 
Power estimates for the geodynamo. Estimates of the power required to drive the 
geodynamo are obtained by considering the slow evolution of the Earth using 
equations describing the balances of energy and entropy in the core. A detailed 
derivation of these equations can be found in a previous study’. Conservation of 
energy simply equates the heat crossing the CMB to the sources within: specific 
heat of cooling Q,, latent heat of freezing Q;, radiogenic heating Q,, gravitational 
energy loss Q, that is converted into heat by the frictional processes associated with 
the convection (almost entirely magnetic), and smaller terms” involving pressure 
changes and chemistry that we shall ignore: 


Qemp = Qs + Qu + Qe + Q, (1) 


All terms on the right-hand side of equation (1) can be written in terms of either 
the cooling rate at the CMB, dT)/dt, or the amount of radiogenic heating, h. There 
is no dependence on the conductivities or the magnetic field, which are merely 
agents by which energy is converted to heat within the core. 

These quantities do enter the entropy balance, however. This equation has dis- 
sipation terms from thermal and electrical conduction, plus viscosity and molecular 
diffusion. They are all positive because of the second law of thermodynamics. They 
are balanced by entropies associated with the power driving the convection: heat 
pumped in at a higher temperature and removed at a lower temperature (Teg) and 
gravitational energy that directly stirs the core and is converted to heat by frictional 
processes, the heat then being convected and conducted away. Note that entropy 
from heat is multiplied by a Carnot-like ‘efficiency factor’, 1/Toy, — 1/Tjn (latent 
heat is the most efficient because it is released at the highest temperature and 
removed at the lowest), while the gravitational energy is not, E, = Q,/Ticx. 
Gravitational energy is more efficient at removing entropy and therefore more 
efficient than heat at generating magnetic field. 


E=E,+ Ey t+ E, + Ey = Ey t+ Eg + Ey (2) 


where the four terms on the left-hand side of the second equality represent secular 
cooling, latent heat release, radiogenic heating and gravitational energy loss. 
Adiabatic conduction entropy, E,, is easily estimated from the thermal conduc- 
tivity and adiabatic gradient and is large, of order 10° W K_'. The new estimate of 
conductivity doubles older ones and the higher ICB temperatures increase it still 
further. Barodiffusion, E,, is the tendency for light elements to migrate down a 
pressure gradient and its associated entropy is significant but small, not exceeding 
2.5MWK ' in any of our estimates. Diffusional processes associated with con- 
vection and the geodynamo also produce entropy, denoted E,, mainly in the small 
scales. This presents a problem in estimation because the dominant contribution 
comes from magnetic fields, fluid flows, temperature and compositional fluctua- 
tions that cannot be observed and, in many cases, cannot even be simulated 
numerically. A low value of the power required to drive the dynamo, 0.5 TW 
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(ref. 34), was obtained from a numerical dynamo simulation”, which at an average 
temperature of 5,000 K translates into E, = 10’WK_}, an order of magnitude 
lower than E,, but the numerical simulation necessarily reduces small scale mag- 
netic fields and the value for the Earth could be much larger. It may well be that 
future numerical simulations with higher resolution will have higher ohmic dis- 
sipation approaching E,. Magnetic diffusivity is much larger than any other dif- 
fusivity in the core, by many orders of magnitude, and in numerical simulations 
the viscosity, thermal, and molecular diffusivities are replaced with turbulent 
values to account for unresolved, turbulent, small scale fields. Even so, the assoc- 
iated entropies remain much smaller than those associated with magnetic fields: 
they are generally ignored, although we should bear in mind that they are all 
positive and could make a contribution. 

Parameter values used to calculate thermal contributions to the energy and 

entropy balances equations (1) and (2) are taken from Table 1 of a previous 
study*®, except for the thermal conductivity and the temperatures of the CMB 
and ICB, which are taken from the present study. Latent heat, Q;, depends on t, 
the difference between the melting and adiabatic gradients at the ICB; the value 
for the former is taken to be 9K GPa | (ref. 36), while the value of the latter is 
calculated from Fig. 1 of this study. Parameter values used to calculate composi- 
tional terms differ slightly from previous work°, owing to their use of different 
concentrations for the light elements O, Si and S in the outer core. Concentration 
enters the calculation of gravitational energy through equation (9) of ref. 5, which, 
along with equation (8) of ref. 5, is used to define Q, in equation (18) of ref. 5. 
Note also Q, depends on t. The remaining changes affect the barodiffusion, E,, 
which makes a small contribution to the entropy budget (2); for completeness we 
list the new parameter values required to determine E,, in Supplementary Tables 1 
and 2. 
Estimating stable layer thicknesses. Radial profiles of the thermal and composi- 
tional energy sources that power the dynamo are determined using the equations 
of a previous study”®, which are derived from the energy balance appropriate for 
the outer core’. The radial profiles represent conductive solutions that satisfy the 
total CMB heat-flux boundary condition for the temperature, zero CMB mass flux 
of light elements, and fixed temperature and light element concentration at the 
ICB”. Superimposed on this basic state are the small fluctuations associated with 
core convection and the dynamo process. 

These radial profiles apply to a Boussinesq fluid and hence neglect compressi- 
bility effects other than when they act to modify gravity. This necessitates the use of 
an approximate form for the adiabatic temperature, a simple choice being a 
quadratic equation expressed in terms of the ICB and CMB temperatures”. 
Despite these simplifications, the CMB heat fluxes computed from equations 
(23)-(27) of the incompressible model”? are in good agreement with those 
obtained from the original equations’ (see Supplementary Table 3), while the 
quadratic approximation for the adiabat differs by at most 10K from the full 
calculation shown in Fig. 1. 

Compositional buoyancy is at least as important for driving the geodynamo as 
thermal buoyancy (see, for example, ref. 5) and so we require a means of com- 
paring the two in radial profiles, which is readily achieved by multiplying the 
former by the ratio of compositional and thermal expansion coefficients, %-/cy. 
This simple device converts compositional effects into equivalent thermal 
effects, thereby allowing all sources of buoyancy to be combined; it is also related 
to the condition of neutral stability discussed below. (However, it must be 
understood that the compositional term resulting from this transformation 
has nothing to do with the gravitational energy, Qo which is neglected in the 
Boussinesq equations.) We use the common approach (see, for example, ref. 
37) of defining all fluxes that represent sources of buoyancy associated with the 
convection in terms of a turbulent diffusivity, which is assumed constant. By 
contrast, the heat flux due to conduction down the adiabatic gradient and 
the equivalent thermal flux due to barodiffusion must be defined in terms of 
molecular quantities. 

The depth variation of the molecular thermal conductivity obtained from the 
DFT results is readily incorporated into the formulation of previous work”. We 
write k = k(r) to express the radial variation of the molecular thermal conductivity; 
equation (8) from ref. 20 must then be replaced by q, = V-(k(r) VT,,), where VT, is 
calculated from equation (12) in ref. 20. k(r) is well-approximated by a parabolic 
conductivity variation, k(r) = ar’ + br +c which we use to calculate the heat flux 
down the adiabatic gradient. 

To investigate the presence of a stable layer, we use temperature gradients 
instead of heat fluxes, which are calculated using equations (30)-(34) of a previous 
study” with k(r) replacing k in the numerator of equation (30) of ref. 20. The 
parameter values are the same as those used to estimate power requirements 
above. We define the base of the stable layer to be the point of neutral stability 
as given by Schwarzchild’s criterion*®: 
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(f_#2) (2) 


where dT/dr is the total temperature gradient, dT,/dr is the adiabatic temperature 
gradient and dc/dr is the total compositional gradient. We write this condition as 
T’ =T,+T/+T!+T!—T/=0, where the terms represent respectively latent heat, 
secular cooling, compositional buoyancy, radiogenic heating and the adiabat, and 
prime indicates differentiation with respect to r (the barodiffusive contribution to 
dc/dr is very small and has been omitted). Possible deviations from the layer 


thicknesses we obtain using this definition can only be obtained by solving the 
complete dynamo equations with correct parameters for the Earth, which is 


impossible at present. We believe this to be the best definition of the base of the 
layer given the nature of our thermodynamic model. 
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Clusters of iron-rich cells in the upper beak of pigeons 
are macrophages not magnetosensitive neurons 


Christoph Daniel Treiber, Marion Claudia Salzer’, Johannes Riegler?, Nathaniel Edelman’, Cristina Sugar’, Martin Breuss', 
Paul Pichler’, Herve Cadiou’, Martin Saunders*, Mark Lythgoe’, Jeremy Shaw* & David Anthony Keays! 


Understanding the molecular and cellular mechanisms that mediate 
magnetosensation in vertebrates is a formidable scientific problem’. 
One hypothesis is that magnetic information is transduced into 
neuronal impulses by using a magnetite-based magnetoreceptor**. 
Previous studies claim to have identified a magnetic sense system in 
the pigeon, common to avian species, which consists of magnetite- 
containing trigeminal afferents located at six specific loci in the 
rostral subepidermis of the beak’*. These studies have been widely 
accepted in the field and heavily relied upon by both behavioural 
biologists and physicists”. Here we show that clusters of iron-rich 
cells in the rostro-medial upper beak of the pigeon Columbia livia are 
macrophages, not magnetosensitive neurons. Our systematic 
characterization of the pigeon upper beak identified iron-rich cells 
in the stratum laxum of the subepidermis, the basal region of the 
respiratory epithelium and the apex of feather follicles. Using a three- 
dimensional blueprint of the pigeon beak created by magnetic res- 
onance imaging and computed tomography, we mapped the location 
of iron-rich cells, revealing unexpected variation in their distribution 
and number—an observation that is inconsistent with a role in mag- 
netic sensation. Ultrastructure analysis of these cells, which are not 
unique to the beak, showed that their subcellular architecture 
includes ferritin-like granules, siderosomes, haemosiderin and 
filopodia, characteristics of iron-rich macrophages. Our conclusion 
that these cells are macrophages and not magnetosensitive neurons is 
supported by immunohistological studies showing co-localization 
with the antigen-presenting molecule major histocompatibility com- 
plex class II. Our work necessitates a renewed search for the true 
magnetite-dependent magnetoreceptor in birds. 

Each year millions of birds complete lengthy journeys guided by the 
Earth’s magnetic field. Current evidence indicates that the detection of 
magnetic fields is mediated by an inclination-sensitive light-dependent 
compass that resides in the retina'*!’, and an intensity-sensitive 
apparatus that is believed to provide information about the magnetic 
map, is associated with the trigeminal nerve, and is thought to rely on 
biogenic magnetite (Fe;O,)'*. The trigeminal nerve was first impli- 
cated in magnetoreception in the bobolink Dolichonyx oryzivorus, 
and was suggested to be sensitive to small alterations in magnetic 
stimuli'*. Subsequent studies revealed that the ophthalmic branch is 
required for pigeons to perform an intensity-based conditioning task’, 
and that neurons in the trigeminal brainstem complex of European 
robins are activated when the birds are subjected to non-uniform 
magnetic fields'®. These data have led to the proposition that the 
sensory cells responsible for magnetite-based magnetoreception lie 
in the upper beak of birds*’’. Previous studies*’ have claimed that 
clusters of iron-containing neurons in six specific bilateral locations 
in the rostral dermis of the upper beak of pigeons constitute a magnetic 
sense system’. It has been contended that this system consists of 
unmyelinated dendrites that contain superparamagnetic spherules 
surrounded by iron platelets that are composed of magnetite and 


maghemite, and that the system is a common sensory apparatus in 
birds*’. These assertions have formed the basis for a host of beha- 
vioural studies and theoretical calculations that aim to advance the 
magnetite theory of magnetoreception®'°"**°. 

To investigate this putative magnetic sense system, we undertook a 
systematic analysis of the prevalence and distribution of all iron-rich 
cells in the upper beak of the pigeon. We perfused adult pigeons 
(Nuremberg cohort, n = 12), and sectioned the upper beak from the 
caudal respiratory concha to the tip of the beak in the coronal plane 
(Fig. 1a, b). We stained serial sections (10 lum) with Prussian blue (PB) 
to label ferric iron, and nuclear fast red (NFR) to identify nuclei, 
followed by counting of all PB-positive cells. We consistently observed 
PB-positive cells in three specific regions: (1) in the stratum laxum of 
the dorsal and/or ventral subepidermis; (2) in the buds of feather 
follicles; and (3) in the basal region of the respiratory epithelium 
(Fig. 1c-e). We confirmed this pattern of staining in a larger collection 
of pigeons originating from seven different lofts (n = 172). PB-positive 
cells in all three regions were characterized by the presence of multiple 
dark blue spherules (0.25-5.0 [um in size) and/or by light blue cytoplasmic 
staining with a notable nucleus (Fig. 1g-k and Supplementary Figs 1-3). 
Subepidermal PB-positive cells in caudal and medial regions were pre- 
dominantly found in the dorsal subepidermis (Fig. 1c), whereas those 
PB-positive cells located rostrally were found in the ventral subepidermis 
lining the inner roof of the beak (Fig. 1f and Supplementary Fig. 1). PB- 
positive cells in the feather follicle clustered in the apical region of the 
bud (Fig. 1d and Supplementary Fig. 2), and those in the respiratory 
epithelium were predominantly found within the lateral edges of the 
concha (Fig. le and Supplementary Fig. 3). 

As it is believed that iron-rich cells in the upper pigeon beak are 
limited to six discrete bilateral anatomical loci®, we mapped the distri- 
bution of PB-positive cells along the rostro-caudal axis of the beak. To 
do this accurately we first created a three-dimensional topographic 
map of the pigeon beak by undertaking high-resolution magnetic 
resonance imaging (MRI) coupled with micro-computed tomography 
(micro-CT) scanning, identifying four specific anatomical landmarks 
(Supplementary Movies 1, 2 and Supplementary Figs 4, 5). After stain- 
ing serial sections, we counted PB-positive cells and used our land- 
marks to map the distribution of cells along the rostro-caudal axis 
(n = 12). We found that PB-positive cells in the respiratory epithelium 
and feather follicles were restricted to caudal regions, whereas those in 
the subepidermal region were found in clusters along the length of the 
beak with no apparent bilateralization (Fig. 2a—-c and Supplemen- 
tary Figs 6, 7 and Supplementary Table 1). We found no significant 
differences in the total number of PB-positive cells between sexes 
(respiratory epithelium (P> 0.5), subepidermis (P> 0.1), feather 
follicle (P > 0.1)) (Supplementary Fig. 8), but observed an extremely 
large variation in the number and distribution of PB-positive cells 
when comparing birds of the same age and sex. For instance, pigeon 
200 had ~200 PB-positive cells in the subepidermis, whereas pigeon 
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Figure 1 | PB-positive cells in the upper beak. a, Schematic showing the head 
ofan adult pigeon. The area shaded red indicates the region of the beak that was 
sectioned, stained and screened for PB-positive cells. br, brain; ce, cere; co, 
concha; oe, olfactory epithelium. b, Upper beak coronal section stained with PB 
and NFR (the rostro-caudal location of the section is indicated with a red line in 
a). Scale bar, 2 mm. c, Schematic of a coronal section highlighting the non- 
uniform distribution of PB-positive cells in the subepidermal region in four 
different birds. d, e, Schematic of a feather follicle and concha showing the 
location of PB-positive cells in four birds. f, Schematic of a coronal section 
rostral to the cere, showing the non-uniform distribution of PB-positive cells in 
the subepidermis in three different birds. g, Quantification of NFR staining in 
PB-positive cells revealing that at least 97% of cells in all regions are nucleated 
(n= 5 birds, n = 180 cells), d, dorsal; v, ventral. Error bars show the standard 
deviation. h-k, Representative PB-positive cells in which the nuclei are visible 
(cell membranes are highlighted by a dashed line, nuclei are marked with n). See 
also Supplementary Figs 1-3. Scale bar, 1 jim. 


203 had ~ 108,800 PB-positive cells located in numerous clusters along 
the length of the beak. Although our serial quantification samples a 
section every 120 um we did not find the six 350-j1m-long bilateral 
clusters that are claimed to constitute a magnetic sense system’. We 
speculated that our pigeon strain might harbour a large genomic dele- 
tion that accounts for the absence of this putative magnetosensitive 
system. To investigate this we quantified PB-positive cells in pigeons 
from another loft (Vienna cohort, n = 6). Similar to our Nuremberg 
cohort we did not find six bilateral clusters, and once again observed a 
large variation in the distribution and number of PB-positive cells 
(Supplementary Fig. 9 and Supplementary Table 2). This variation is 
not consistent with a genetically encoded sensory apparatus responsible 
for magnetosensation. 

Next we asked whether PB-positive cells are neurons by triple stain- 
ing sections with PB, NFR, and one of three different antibodies that 
label neuronal structures: neurofilament (NF), TUBB3 and MAP1B 
(n= 5 birds). In the respiratory epithelium we observed 0.04% co- 
localization with NF (n= 1,208 cells), 0.6% co-localization with 
TUBB3 (n= 2,818 cells), and 0.01% co-localization with MAP1B 
(n = 2,213 cells). In the subepidermis we found no co-localization with 
NF (n= 471 cells) or MAP1B (n= 803 cells), and only 0.06% co- 
localization with TUBB3 (n = 1,309 cells). Finally, in the feather follicle 
we found no co-localization with NF (n= 286 cells) or MAPIB 
(n = 295 cells), and only 0.24% co-localization with TUBB3 (n = 407 
cells) (Supplementary Figs 10, 11). The simplest explanation for the 
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Figure 2 | Number and distribution of PB-positive cells in the upper beak. 
a-c, Plots showing the number of PB-positive cells per 500 um increment along 
the length of the beak in the respiratory epithelium (blue dots) 

(a), subepidermis (red dots) (b) and the feather follicle (green dots) (¢; n = 12 
birds). Each dot represents the number of PB-positive cells in an individual bird 
in that increment, with males (n = 6) shown in dark colours and females 

(n = 6) in light colours. PB-positive cells in the feather follicle and respiratory 
epithelium are predominantly found in caudal regions, whereas cells in the 
subepidermis are found along the length of the beak but are most prevalent 
beneath the cere. Total cell numbers per increment are plotted on a logarithmic 
axis, highlighting the large variation between birds. Individual birds are shown 
in Supplementary Figs 6 and 7. 


very small amount of apparent co-localization we observed is that two 
cells, one PB positive and the other positive for a neuronal marker, lie in 
the same vertical plane, and because of the nature of the chemical stain 
used cannot be distinguished from one another. Taken together, our 
results strongly suggest that the clusters of PB-positive cells in the beak 
of the pigeon are not neurons. 

To ascertain the true identity of the PB-positive cells, we undertook 
an analysis of their ultrastructure using transmission electron micro- 
scopy (TEM) (n =3 birds) (Supplementary Fig. 12). We observed 
ferritin-like granules (6-9nm) throughout the cytoplasm of PB- 
positive cells from all regions and in some instances haemosiderin 
masses and/or membrane-bound electron-dense organelles known as 
siderosomes*'”” (~300 nm) (Fig. 3a-f and Supplementary Fig. 13). 
Energy-filtered transmission electron microscopy (EFTEM) confirmed 
that each electron-dense granule was composed of iron (Supplementary 


©2012 Macmillan Publishers Limited. All rights reserved 


Subepidermal region Respiratory epithelium 


Feather follicle 


Figure 3 | Ultrastructure of PB-positive cells. a-f, Representative electron 
micrographs of two different PB-positive cells in the respiratory epithelium 

(a, b), subepidermal region (c, d) and the feather follicle (e, f, n = 3 birds). Cells in 
all regions were found to contain ferritin-like granules (6-9 nm in diameter) that 
are present throughout the cytoplasm. In addition we observed haemosiderin (he) 
clumps and/or membrane bound siderosomes (si). Osmophillic lipid droplets 
(os) are visible in cells in the respiratory epithelium in b. Cells in the subepidermal 
region and feather follicle were noted for their slender cytoplasmic projections 
resembling filopodia, that are seen to engulf a cell in d. Cells in all regions were 
nucleated (nu). See also Supplementary Fig. 13. Scale bars, 1 jum. 


Fig. 14). Selected area electron diffraction (SAED) failed to identify 
any cellular structures that contained magnetite, but showed that 
haemosiderin masses in the feather follicle consist of a goethite-like 
material (1 =3 birds), whereas siderosomes in the respiratory 
epithelium are comprised of ferrihydrite (n = 2 birds) (Supplemen- 
tary Fig. 15 and Supplementary Table 3). On a cellular level, PB- 
positive cells in the respiratory epithelium are characterized by the 
presence of osmophilic lipid vacuoles (Fig. 3b), whereas those cells 
originating from the feather follicle and subepidermal region had 
notable dendritic extensions that resembled filopodia (Supplemen- 
tary Fig. 13d-i, k). In some instances these cytoplasmic tentacles 
appeared to engulf neighbouring cells, suggesting to us that the PB 
cells may be phagocytic macrophages (Fig. 3d). 

Macrophages are known to reside in the spleen, dermis and respiratory 
mucosa of multiple species, and to have a vital role in host defence and 
iron homeostasis”. Iron accumulates within macrophages during 
the catabolism of haemoglobin and is stored as ferritin**. In one class 
of macrophages known as siderophages, ferritin accumulates in 
membrane-bound siderosomes, which can then undergo proteolytic 
processing forming haemosiderin. This accumulation of iron renders 
these cells PB positive”*”*. To ascertain whether the PB-positive cells in 
the upper beak of the pigeon were siderophages, we stained cryosec- 
tions with sera against major histocompatibility complex class II (MHC 
II), which labels antigen-presenting cells including macrophages, 
alongside positive and negative controls (n = 4 birds) (Fig. 4a—c and 
Supplementary Fig. 11)”. We observed MHC II co-localization with 
98.8% of PB-positive cells in the respiratory epithelium (n = 104 cells), 
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Figure 4 | MHC II immunohistochemistry on PB-positive cells. 
a-c, Representative images of coronal sections triple stained with PB, NFR, and 
sera against the antigen-presenting marker MHC II (n = 4 birds). MHC II 
staining was predominant on the surface of cells and found to co-localize 


with =94% of PB-positive cells in all regions. Controls are shown in 
Supplementary Fig. 11. Scale bar, 10 um. 


95% of PB-positive cells in the subepidermis (n = 92 cells), and 94.4% 
in the feather follicle (n = 205 cells). Taken together with our anatom- 
ical mapping, subcellular data and neuronal staining, we conclude that 
clusters of PB-positive cells in the upper beak of the pigeon C. livia are 
macrophages, not magnetosensitive neurons. 

As macrophages are not unique to the upper beak, our finding 
predicts that PB-positive cells should be found throughout the pigeon. 
To test this we stained skin samples from the back, abdomen, neck, 
scalp, wing and lower beak of the pigeon (n = 3 birds). This revealed 
widespread PB-positive staining in the subepidermis and feather 
follicle, which was indistinguishable from that observed in the upper 
beak (Supplementary Fig. 16). Our conclusion further predicts the 
infiltration of PB-positive macrophages in response to tissue damage 
or host invasion. We observed such a response in the beak of one of our 
pigeons (P199), where a large inflammatory lesion with a necrotic 
centre was surrounded by lymphoplasmacytic cells, including 
~80,000 PB-positive cells that were again characterized by constella- 
tions of blue spherules and/or by light blue cytoplasmic staining 
(Supplementary Fig. 17). 

Although we cannot exclude the possibility that a small number of 
sparsely distributed magnetoreceptors reside at an unknown location 
in the upper beak of pigeons, this study finds no evidence to support 
the existence of a subepidermal magnetic sense system that consists of 
iron-containing dendrites at six specific bilateral loci. This conclusion, 
which is supported by a critical analysis of the elemental composition 
of PB-positive cells in the subepidermis”, has several important impli- 
cations. First, it requires a re-evaluation of behavioural studies that 
have purported to impair the function of a magnetite-based receptor in 
the subepidermis of the upper beak and the conclusions that these 
studies reached””°. Second, it necessitates a re-assessment as to whether 
superparamagnetic magnetite has the necessary physical and magnetic 
properties to act as a magnetosensor in a living system'®''”. Third, our 
work reveals that the sensory cells that are responsible for trigeminally 
mediated magnetic sensation in birds remain undiscovered. These 
enigmatic cells may reside in the olfactory epithelium, a sensory struc- 
ture that has been implicated in magnetoreception in the rainbow 
trout”. 

METHODS SUMMARY 


Histological studies. We perfused adult pigeons with 4% phosphate-buffered 
paraformaldehyde (PFA, pH 7.4), and dissected the tissue with ceramic-coated 
tools. We embedded the tissue in paraffin and prepared 10-|1m coronal sections. 
We stained sections for 20 min in 5% potassium hexacyanoferrate with 10% HCl, 
followed by a series of washes and a 2 min exposure to NFR. For immunohisto- 
chemistry we incubated the sections with the primary antibody for 18h in 0.1% 
Triton PBS with 2-4% milk (pH 7.4), before detection using standard methods, 
followed by PB staining. 

Ultrastructure studies. Adult pigeons were perfused with 2.5% glutaraldehyde 
supplemented with 2% PFA in PBS (pH 7.4). Tissue was dissected with ceramic- 
coated tools, incubated for 1h in phosphate-buffered 2% osmium (pH 7.4), 
dehydrated and embedded in epoxy resin. We prepared alternative semi-thin 
(2m) and ultra-thin (70nm) sections. Semi-thin sections were stained with 
PB, and the ultra-thin sections were used for TEM, EFTEM and SAED. 
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Imaging studies. Adult pigeons were perfused with 4% PFA. Following post- 
fixation and mounting, MRI imaging was performed on a horizontal bore 9.4 T 
DirectDrive VNMRS system (Agilent Technologies), and CT on a Nucline Nano 
SPECT/CT imaging system (Mediso). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


PB staining. We optimized our PB-staining protocol on our large cohort of 
pigeons (n = 172) that we sourced from seven different lofts. We experimented 
with a variety of different fixatives, section thickness, and blades before adopting 
the following protocol. Adult homing pigeons (n = 12 Nuremberg cohort, n = 6 
Vienna cohort) were killed before perfusion with 4% phosphate-buffered 
paraformaldehyde (PFA, pH 7.4). The tissue was dissected with ceramic-coated 
tools and post-fixed for 18h before dehydration in an increasing alcohol series. 
The tissue was then embedded in paraffin and sectioned coronally (10 um) with 
ceramic-coated microtome blades (DuraEdgeHigh Profile, BLM00103P). Four 
consecutive coronal sections were mounted on positively charged microscope 
slides (Menzel Superfrost PLUS, Thermo Scientific). For PB staining we incubated 
every third slide in a freshly prepared solution of 5% potassium hexacyanoferrate 
(Sigma, P9387) in 10% HCl for 20 min. After three washes in double distilled HO, 
sections were counterstained for 2 min in NFR (Sigma, 60700). Each slide was then 
scanned (MIRAX Slide Scanner) and PB-positive cells counted manually, using 
light microscopy where necessary. For anatomical mapping, landmarks were 
identified (Supplementary Fig. 5), and 500-um-thick increments determined. 
To obtain estimated total cell counts the number of PB-positive cells counted 
within a normalized increment were divided by the number of sections counted 
within that increment and multiplied by a factor of 50 (see Supplementary Tables 1 
and 2). We compared the number of PB-positive cells in males and females by 
performing a Student’s t-test. All pigeons were sexed using genetic methods as 
previously described**, and experiments were performed in accordance with the 
relevant guidelines and regulations (Magistrat 60, Veterinaramt, MA60-001603/ 
2010/002). 

Immunohistochemistry. For staining with neuronal antibodies slides were de- 
paraffinized and washed in PBS (pH 7.4) before incubation with the primary 
antibody for 18h in 0.1% Triton PBS with 2% milk (pH 7.4). Primary antibodies 
were used at the following concentrations: NF (Millipore, MAB1621, 1:2,000), 
TUBB3 (Covance, MMS-435P, 1:1,000), MAP1B (Santa Cruz, SC-58784, 1:75). 
Following a series of washes in PBS, slides were incubated for 2 h with a biotinylated 
secondary antibody (1:500), before visualization with a peroxidase-based 
Vectastain Elite ABC kit (Vector Labs, PK-4002) and the chromophore DAB 
(3,5-diaminobenzidine, Dako). For MHCII staining, 10-j1m cryosections were pre- 
pared, quenched in 2% HO, in PBS for 30 min, before incubation for 18h in 0.1% 
Triton PBS with 4% milk (pH 7.4) with the primary antibody (Santa Cruz, SC- 
59323, 1:500). This antibody is a mouse monoclonal antibody raised against white 
blood cells originating from the chicken. To avoid cross reaction with endogeneous 
biotin/avidin, a HRP-conjugated secondary antibody was used (Biorad, 1:500), and 
staining visualized with DAB. Sections were thoroughly washed with PBS before PB 
staining and scanning as described above. All cell counting and co-localization 
studies were performed blinded to the antibody used. The overall percentage of 
co-localization was determined by calculating the rate of co-localization per bird, 
and ascertaining the mean. 
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Ultrastructure studies. Adult pigeons were perfused with 2.5% glutaraldehyde 
supplemented with 2% PFA in PBS (pH 7.4) (Glut-PFA), before tissue dissection 
with ceramic-coated tools. Following post-fixation for 48 h, this tissue was washed 
with PBS, incubated for 1h in phosphate-buffered 2% osmium (pH 7.4), 
dehydrated and embedded in epoxy resin. After polymerization, the blocks were 
trimmed and sectioned, alternatively taking semi-thin (211m) and ultra-thin 
(70 nm) sections. The ultra-thin sections were mounted on formvar-filmed copper 
slot grids for TEM, whereas the semi-thin sections were etched with 21% sodium 
ethoxide in ethanol, rehydrated and stained with PB and NFR. Where necessary, PB 
staining was intensified by incubating the sections in 0.5% DAB*!. TEM imaging on 
ultra-thin sections used a 100 kV electron microscope (FEI Morgagni 268D) with a 
CCD camera (Morada Olympus-SIS). For EFTEM imaging and selected area elec- 
tron diffraction, the ultra-thin sections were mounted on holey carbon-filmed 
copper finder grids (Quantifoil, R3.5/1) and analysed on a 200kV TEM (JEOL, 
2100) fitted with a Gatan Imaging Filter (Tridiem) and CCD camera (Orius 
SC1000). For EFTEM, bright-field images were taken before obtaining elemental 
maps for iron, which were acquired using the iron M-edge and generated using the 
conventional three-window method. Two pre-edge (background) images were 
acquired at energies of 45 and 50eV, and the post-edge (signal energy) image 
was acquired by centring the filter’s energy-selecting slit at 59 eV with a slit width 
of 5eV (~10s acquisition time). Diffraction data was obtained from iron deposits 
identified by EFTEM and the data calibrated against a polycrystalline gold standard. 
MRI. Animals were killed and perfused with 4% PFA as described above. 
Following 18h of post-fixation, heads were mounted in a 70-mm diameter PE 
tube filled with proton-free perfluoro-polyether fomblin (Solvay Solexis S.p.A.). 
Imaging was performed on a horizontal bore 9.4 T DirectDrive VNMRS system 
(Agilent Technologies) using a 72 mm quadrature birdcage volume coil (RAPID 
Biomedical GmbH). For three-dimensional imaging, a gradient-echo sequence 
with the following parameters was used: time to echo (TE), 2.8 ms; time to repe- 
tition (TR), 280 ms; flip angle, 40°; six averages, field of view, 70 X 70 X 35 mm; 
matrix size, 512 X 512 X 256. For higher-resolution images of the beak, PFA-fixed 
beaks were incubated with 8 mM gadolinium solution (Magnevist, Bayer AG) in 
PBS for 48h followed by embedding in agar containing 8mM gadolinium. 
Gradient-echo images were acquired using the following parameters: TE, 2.7 ms; 
TR, 25 ms; flip angle, 45°; five averages; field of view, 28 X 28 X 35 mm; matrix 
size, 560 X 560 X 700. Regions of interest were segmented using thresholds with 
manual adjustments where necessary using Amira visualization software (v.5.2.2, 
Visage Imaging). 

CT imaging. CT imaging was performed on a Nucline Nano SPECT/CT imaging 
system (Mediso) using the following imaging parameters: 360 projection, pitch 
0.5, 55 kVp, 145 [1A acquired at 45 j1m isotropic reconstructed to 50 [1m isotropic. 


30. Horng, Y.M.,Wu, C.P.,Wang, Y.C. & Huang, M.C. Anovel molecular genetic marker 
for gender determination of pigeons. Theriogenology 65, 1759-1768 (2006). 

31. Moos, T. & Mollgard, K. A sensitive post-DAB enhancement technique for 
demonstration of iron in the central nervous system. Histochemistry 99, 471-475 
(1993). 
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In eukaryotes transcriptional regulation often involves multiple 
long-range elements and is influenced by the genomic environ- 
ment’. A prime example of this concerns the mouse X-inactivation 
centre (Xic), which orchestrates the initiation of X-chromosome 
inactivation (XCI) by controlling the expression of the non- 
protein-coding Xist transcript. The extent of Xic sequences 
required for the proper regulation of Xist remains unknown. 
Here we use chromosome conformation capture carbon-copy 
(5C)? and super-resolution microscopy to analyse the spatial 
organization of a 4.5-megabases (Mb) region including Xist. We 
discover a series of discrete 200-kilobase to 1 Mb topologically 
associating domains (TADs), present both before and after cell 
differentiation and on the active and inactive X. TADs align with, 
but do not rely on, several domain-wide features of the epigenome, 
such as H3K27me3 or H3K9me2 blocks and lamina-associated 
domains. TADs also align with coordinately regulated gene clusters. 
Disruption of a TAD boundary causes ectopic chromosomal con- 
tacts and long-range transcriptional misregulation. The Xist/Tsix 
sense/antisense unit illustrates how TADs enable the spatial 
segregation of oppositely regulated chromosomal neighbourhoods, 
with the respective promoters of Xist and Tsix lying in adjacent 
TADs, each containing their known positive regulators. We identify 
a novel distal regulatory region of Tsix within its TAD, which pro- 
duces a long intervening RNA, Linx. In addition to uncovering a 
new principle of cis-regulatory architecture of mammalian chromo- 
somes, our study sets the stage for the full genetic dissection of the 
X-inactivation centre. 

The X-inactivation centre was originally defined by deletions and 
translocations as a region spanning several megabases**, and contains 
several elements known to affect Xist activity, including its repressive 
antisense transcript Tsix and its regulators Xite, DXPas34 and Tex: 
However, additional control elements must exist, as single-copy trans- 
genes encompassing Xist and up to 460 kb of flanking sequences are 
unable to recapitulate proper Xist regulation’. To characterize the cis- 
regulatory landscape of the Xic in an unbiased approach, we performed 
5C’ across a 4.5-Mb region containing Xist. We designed 5C-Forward 
and 5C-Reverse oligonucleotides following an alternating scheme’, 
thereby simultaneously interrogating nearly 250,000 possible chromo- 
somal contacts in parallel, with a mean resolution of 10-20 kb (Fig. 1a; 
see Supplementary Methods). Analysis of undifferentiated mouse 
embryonic stem cells (ESCs) revealed that long-range (>50 kb) con- 
tacts preferentially occur within a series of discrete genomic blocks, 
each covering 0.2-1 Mb (Fig. 1b). These blocks differ from the higher- 
order organization recently observed by Hi-C’*, corresponding to 
much larger domains of open or closed chromatin, that come together 
in the nucleus to form A and B types of compartments’. Instead, our 


5C analysis shows self-associating chromosomal domains occurring at 
the sub-megabase scale. The size and location of these domains is 
identical in male and female mouse ESCs (Supplementary Fig. 1) 
and in different mouse ESC lines (Supplementary Fig. 2 and 
Supplementary Data 1). 

To examine this organization with an alternative approach, we per- 
formed three-dimensional DNA fluorescent in situ hybridization 
(FISH) in male mouse ESCs. Nuclear distances were found to be sig- 
nificantly shorter between probes lying in the same 5C domain than in 
different domains (Fig. 1c, d), and a strong correlation was found 
between three-dimensional distances and 5C counts (Supplementary 
Fig. 3a, b). Furthermore, using pools of tiled bacterial artificial chro- 
mosome (BAC) probes spanning up to 1 Mb and structured illumina- 
tion microscopy, we found that large DNA segments belonging to the 
same 5C domain colocalize to a greater extent than DNA segments 
located in adjacent domains (Fig. le), and this throughout the cell cycle 
(Supplementary Fig. 3c, d). Based on 5C and FISH data, we conclude 
that chromatin folding at the sub-megabase scale is not random, and 
partitions this chromosomal region into a succession of topologically 
associating domains (TADs). 

We next investigated what might drive chromatin folding in TADs. 
We first noticed a striking alignment between TADs and the large 
blocks of H3K27me3 and H3K9me2 (ref. 9) that are known to exist 
throughout the mammalian genomes'”* (for example, TAD E, Fig. 2 
and Supplementary Fig. 4). We therefore examined 5C profiles of 
G9a/— (also known as Ehmt2) mouse ESCs, which lack H3K9me2, 
notably at the Xic'*, and Eed~'~ mouse ESCs, which lack H3K27me3 
(ref. 15). No obvious change in overall chromatin conformation was 
observed, and TADs were not affected either in size or position in these 
mutants (Fig. 2 and Supplementary Fig. 4b). Thus TAD formation is 
not due to domain-wide H3K27me3 or H3K9me2 enrichment. 
Instead, such segmental chromatin blocks might actually be delimited 
by the spatial partitioning of chromosomes into TADs. 

We then addressed whether folding in TADs is driven by discrete 
boundary elements at their borders. 5C was performed in a mouse ESC 
line carrying a 58-kb deletion (AXTX"*), encompassing the boundary 
between the Xist and Tsix TADs (D and E; Fig. 2b). We observed 
ectopic contacts between sequences in TADs D and E and an altered 
organization of TAD E. Boundary elements can thus mediate the 
spatial segregation of neighbouring chromosomal segments. Within 
the TAD D-E boundary, a CTCF-binding site was recently implicated 
in insulating Tsix from remote regulatory influences'’. However, align- 
ment of CTCF- and cohesin-binding sites in mouse ESCs"* with our 5C 
data showed that, although these factors are present at most TAD 
boundaries (Supplementary Fig. 4), they are also frequently present 
within TADs, excluding them as the sole determinants of TAD 


‘nstitut Curie, 26 rue d’UIm, Paris F-75248, France. ?CNRS UMR3215, Paris F-75248, France. 7INSERM U934, Paris F-75248, France. 4Programs in Systems Biology and Gene Function and Expression, 
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605-0103, USA. 5INSERM U900, Paris, F-75248 France. °Mines 
ParisTech, Fontainebleau, F-77300 France. ’Institute of Pathology, Charité-Universitatsmedizin, 10117 Berlin, and Institute of Theoretical Biology Humboldt Universitat, 10115 Berlin, Germany. 
8Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California 94158-2517, USA. Department of Reproduction and Development, Erasmus MC, University 


Medical Center, 3000 CA Rotterdam, The Netherlands. 
*These authors contributed equally to this work. 


00 MONTH 2012 | VOL 000 | NATURE|1 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


a XiteTsix Xist Ftx 
Nap1l2 Ppnx Cdx4 Chic1 Tsx Zechc13 
5C-Forward 5C-Reverse Unused (repeat) 
chrx | 99.5 Mb | 100 Mb I 100.5 MbI 101 Mb| 
Hl Ogt Rgag4 1 Hdac’ Kee] Gm5166 © Nap1l2 4 Tsx Sic16a2 Hke<<4| 
Cxcr3 | Roag4 | Phkat Cdx4 P dpx Gm6222 | 
Ihsl2 Phkat HHH Chict HH 
Pin4 || Gmg112 | Gm5126 Tsix 4A 
Erccél Dmrtctb 1 4930519F16Rik Xist "a 
Rps4x | Dmrtctc2 | Xist = 
Cited? | Dmrtctc1 | Gm9g159 | 
Dmrtctc2 4 Fix mH 
Dmrtctc1 4 Zechc13 } 
1700031FO5Rik 9 
Dmrtcta b 
b TTT a oOo oT c 
chrX_,99 Mb .100 Mb 1101 Mb 102 Mb 103 Mb TADS 
a2 
é | « 
3 'B 
| 
TADs 


100 Mb: 


Fosmid probes: 1 
| D BAC probes pool | 
BAC probes pool II 


101 Mb: 


102 Mb: 


| 
Median count in 30-kb window 


SS 
2 98% of max 


103 Mb: 


1 Mb 


Figure 1 | Chromosome partitioning into topologically associating 
domains (TADs). a, Distribution of 5C-Forward and 5C-Reverse HindIII 
restriction fragments across the 4.5 Mb analysed showing positions of RefSeq 
genes and known XCI regulatory loci. b, 5C data sets from XY undifferentiated 
mouse ESCs (E14), displaying median counts in 30-kb windows every 6 kb. 
Chromosomal contacts are organized into discrete genomic blocks (TADs 
A-I). A region containing segmental duplications excluded from the 5C 
analysis is masked (white). c, Positions of DNA FISH probes. d, Interphase 


positioning. Furthermore, the fact that the two neighbouring domains 
do not merge completely in AXTX cells (Fig. 2b) implies that addi- 
tional elements, within TADs, can act as relays when a main boundary 
is removed. The factors underlying an element’s capacity to act as a 
canonical or shadow boundary remain to be investigated. 

Next we asked whether TAD organization changes during differ- 
entiation or XCI. Both male neuronal progenitors cells (NPCs) and 
male primary mouse embryonic fibroblasts (MEFs) show similar 
organization to mouse ESCs, with no obvious change in TAD posi- 
tioning. However, consistent differences in the internal contacts within 
TADs were observed (Fig. 3a, Supplementary Figs 2 and 5). Noticeably, 
some TADs were found to become lamina-associated domains’ 
(LADs) at certain developmental stages (Fig. 3b). Thus chromosome 
segmentation into TADs reveals a modular framework where changes 
in chromatin structure or nuclear positioning can occur in a domain- 
wide fashion during development. 

We then assessed TAD organization on the inactive X, by combin- 
ing Xist RNA FISH, to identify the inactive X, and super-resolution 
DNA FISH using BAC probe pools on female MEFs. We found that 
colocalization indices on the inactive X were still higher for sequences 
belonging to the same TAD than for neighbouring TADs (Supplemen- 
tary Fig. 6a). However, the difference was significantly lower for the 
inactive X than for the active X. Deconvolution of the respective con- 
tributions of the active X and inactive X in 5C data from female MEFs 
(see Supplementary Methods and Supplementary Fig. 6) similarly 
revealed that global organization in TADs remains on the inactive 
X, albeit in a much attenuated form, but that specific long-range 
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nuclear distances are smaller for probes in the same 5C domain. e, Structured 
illumination microscopy reveals that colocalization of neighbouring sequences 
is greater when they belong to the same 5C domain. Boxplots show the 
distribution of Pearson’s correlation coefficient between red and green 
channels, with whiskers and boxes encompassing all and 50% of values, 
respectively; central bars denote the median correlation coefficient. Statistical 
significance was assessed using Wilcoxon’s rank sum test. 


contacts within TADs are lost. This, together with a recent report 
focused on longer-range interactions”’, suggests that the inactive X 
has a more random chromosomal organization than its active homo- 
logue, even below the megabase scale. 

We next investigated how TAD organization relates to gene 
expression dynamics during early differentiation. A transcriptome 
analysis, consisting of microarray measurements at 17 time points over 
the first 84h of female mouse ESC differentiation was performed 
(Fig. 4a). During this time window, most genes in the 5C region were 
either up- or downregulated. Statistical analysis demonstrated that 
expression profiles of genes with promoters located within the same 
TAD are correlated (Fig. 4b). This correlation (median correlation 
coefficient cc of 0.40) is significantly higher than for genes in different 
domains (cc of 0.03, P< 10°) or for genes across the X chromosome in 
randomly selected, TAD-size regions (cc of 0.09, P< 10”). The 
observed correlations within TADs seem not to depend on distance 
between genes, and are thus distinct from previously described corre- 
lations between neighbouring genes”' that decay on a length scale of 
approximately 100 kb (Supplementary Fig. 7). Our findings indicate 
that physical clustering within TADs may be used to coordinate gene 
expression patterns during development. Furthermore, deletion of the 
boundary between Xist and Tsix in AXTX cells was accompanied by 
long-range transcriptional misregulation (Supplementary Fig. 8), 
underlining the role that chromosome partitioning into TADs can play 
in long-range transcriptional control. 

A more detailed analysis of each domain (Supplementary Fig. 7) 
revealed that co-expression is particularly pronounced in TADs D, E 
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Figure 2 | Determinants of topologically associating domains. a, Blocks of 
contiguous enrichment in H3K27me3 or H3K9me2 (ref. 11) align with the 
position of TADs (chromatin immunoprecipitation on chip from ref. 9) in wild- 
type cells (TT2), but TADs are largely unaffected in the absence of H3K9me2 in 
male G9a_'~ cells or H3K27me3 in male Eed~'~ cells. b, Deletion of a 
boundary at Xist/Tsix disrupts folding pattern of the two neighbouring TADs. 


and F (Fig. 4b, c). Although correlations are strongest within TADs, 
there is some correlation between TADs showing the same trend, such 
as TADs D and F, which are both downregulated during differenti- 
ation. Only TAD E, which contains Xist and all of its known positive 
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regulators Jpx, Ftx, Xpr/Xpct and Rnf12° (Jpx, Ftx, Xpct and Rnf12 are 
also known as Enox, B230206F22Rik, Slc16a2 and Rlim, respectively) is 
anti-correlated with most other genes in the 4.5 Mb region, being 
upregulated during differentiation (Supplementary Fig. 7). The fact 
that these coordinately upregulated loci are located in the same TAD 
suggests that they are integrated into a similar cis-regulatory network, 
potentially sharing common cis-regulatory elements. We therefore 
predict that TAD E (~550 kb) represents the minimum 5’ regulatory 
region required for accurate Xist expression, explaining why even the 
largest transgenes tested so far (covering 150kb 5’ to Xist, Fig. 5a) 
cannot recapitulate normal Xist expression’. 

The respective promoters of Xist and Tsix lie in two neighbouring 
TADs with transcription crossing the intervening boundary (Fig. 2b), 
consistent with previous 3C experiments”. Whereas the Xist promoter 
and its positive regulators are located in TAD E, the promoter of its 
antisense repressor, Tix, lies in TAD D, which extends up to Ppnx 
(also known as 4930519F16Rik)/Nap1l2, more than 200kb away 
(Fig. 2b). Thus, in addition to the Xite enhancer, more distant elements 
within TAD D may participate in Tsix regulation. To test this we used 
two different single-copy transgenic mouse lines, Tg53 and Tg80 
(ref. 23). Both transgenes contain Xist, Tsix and Xite (Fig. 5a). Tg53 
encompasses the whole of TAD D, whereas Tg80 is truncated just 5’ to 
Xite (Fig. 5a and Supplementary Fig. 9). In the inner cell mass of male 
mouse embryos at embryonic day 4.0 (E4.0), T'six transcripts could be 
readily detected from Tg53, as well as from the endogenous X (Fig. 5b). 
However, no Tsix expression could be detected from Tg80, which lacks 
the distal portion of TAD D (Fig. 5b). Thus, sequences within TAD D 
must contain essential elements for the correct developmental regu- 
lation of Tsix. 

Within TAD D, several significant looping events involving the Tsix 
promoter or its enhancer Xite were detected (Figs 2b and 5a, 
Supplementary Fig. 10). Alignment of 5C maps with chromatin sig- 
natures of enhancers in mouse ESCs (Supplementary Fig. 11) sug- 
gested the existence of multiple regulatory elements within this 
region. We also identified a transcript initiating approximately 50 kb 
upstream of the Ppnx promoter (Fig. 5a), from a region bound by 
pluripotency factors and corresponding to a predicted promoter for 
a large (80 kb) intervening non-coding RNA (lincRNA™, Supplemen- 
tary Fig. 12) which we termed Linx (large intervening transcript in the 
Xic). Linx RNA shares several features with non-coding RNAs, such as 
accumulation around its transcription site* (Fig. 5c), nuclear enrich- 
ment and abundance of the unspliced form’® (Supplementary Fig. 12 
and 13). Linx and Tsix are co-expressed in the inner cell mass of 
blastocysts from E3.5-4.0 onwards, as well as in male and female 
mouse ESCs (Fig. 5c). Linx RNA is not detected earlier in embryogenesis, 
nor in extra-embryonic lineages, implying an epiblast-specific function 
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examples of tissue-specific patterns). b, Lamina-associated domains (LADs, 
from ref. 19) align with TADs. Chromosomal positions of tissue-specific LADs 
reflect gain of lamina association by TADs, as well as internal reorganization of 
lamina-associated TADs during differentiation. 
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Figure 4 | Transcriptional co-regulation within topologically associating 
domains. a, Female mouse ESCs were differentiated towards the epiblast stem 
cell lineage for 84h. Transcript levels were measured every 4-6 h at 17 different 
time points by microarray analysis. b, Pearson’s correlation coefficients over all 
time points were calculated for gene pairs lying in the same TAD, pairs in 
different TADs and for pairs in randomly defined domains on the X 
chromosome that contain a similar number of genes and are of comparable size. 
Boxplots show the distribution of Pearson’s correlation coefficients, with 
whiskers and boxes encompassing all and 50% of values, respectively, and 
central bars denoting the median correlation coefficient. * represents significant 
difference with P< 10 7 using Wilcoxon’s rank sum test. c, Pearson’s 
correlation coefficients for gene pairs in TADs D, E and F with red denoting 
positive and blue negative correlation. Boxes indicate the TAD boundaries. 


(Supplementary Fig. 9). Triple RNA FISH for Linx, Tsix and Xist in 
differentiating female mouse ESCs (Supplementary Fig. 14) revealed 
that before Xist upregulation, the probability of Tsix expression from 
alleles co-expressing Linx is significantly higher than from alleles that 
do not express Linx (Fig. 5d). Furthermore, Linx expression is fre- 
quently monoallelic, even before Xist upregulation (Supplementary 
Fig. 14), revealing a transcriptional asymmetry of the two Xic alleles 
before XCI. Taken together, our experiments based on 5C, transgenesis 
and RNA FISH, point towards a role for Linx in the long-range tran- 
scriptional regulation of Tsix — either through its chromosomal asso- 
ciation with Xite and/or via the RNA it produces. This analysis of the 
Xist/Tsix region illustrates how spatial compartmentalization of 
chromosomal neighbourhoods in TADs partitions the Xic into two 
large regulatory domains, with opposite transcriptional fates (Sup- 
plementary Fig. 15). 

In conclusion, our study reveals that sub-megabase folding of 
mammalian chromosomes results in the self-association of large 
chromosomal neighbourhoods in the three-dimensional space of the 
nucleus. The stability of such partitioning throughout differentiation, X 
inactivation and in cell lines with impaired histone-modifying 
machineries, indicates that this level of chromosomal organization 
may provide a basic framework onto which other domain-wide 
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Figure 5 | 5C maps reveal new regulatory regions in the Xic. a, Statistically 
significant looping events (5C peaks) for restriction fragments within Xite, Tsix 
promoter or Xist promoter within their respective TAD, in male (E14) mouse 
ESCs. The Tg80 YAC transgene lacks genomic elements found to interact 
physically with Xite/Tsix that are present in Tg53. b, RNA FISH analysis of Tsix 
expression is detected in the inner cell masses of heterozygous transgenic male 
E4.0 embryos by RNA FISH from single-copy paternally inherited Tg53 but not 
Tg80 transgenes. Transgenic (star) and endogenous Tsix alleles (arrowhead) 
were discriminated by subsequent DNA FISH as in Supplementary Fig. 5. 

n = 20 inner cell mass cells (two embryos each). c, Linx transcripts (green, wil- 
1985N4 probe) are expressed in both E4.0 inner cell mass cells and mouse ESCs, 
together with Tsix (red, DXPas34 probe), and unspliced transcripts accumulate 
locally in a characteristic cloud-like shape. d, RNA FISH in differentiating 
female mouse ESCs revealing synchronous downregulation of Linx and Tsix 
with concomitant upregulation of Xist (detected with a strand-specific probe). 
Bars are the standard deviation around the mean of three experiments. Triple- 
colour RNA FISH allows simultaneous detection of Linx, Tsix and Xist RNAs. 
Scoring of Xist-negative alleles demonstrates that before Xist upregulation Tsix 
expression is more frequent from Linx-expressing alleles than from Linx non- 
expressing alleles, at all time points tested. Presented is the mean of three 
experiments. Statistical differences were assessed using Fisher’s exact test. Cells 
were differentiated in monolayers by withdrawal of leukaemia inhibitory factor 
(LIF). 
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features, such as lamina association and blocks of histone modification, 
can be dynamically overlaid. Our data also point to a role for TADs in 
shaping regulatory landscapes, by defining the extent of sequences that 
belong to the same regulatory neighbourhood. We anticipate that 
TADs may underlie regulatory domains previously proposed on the 
basis of functional and synteny conservation studies””*. We believe 
that the principles we have revealed here will not be restricted to the 
Xic, as spatial partitioning of chromosomal neighbourhoods occurs 
throughout the genome of mouse and human’, as well as 
Drosophila” and E. coli*'. We have shown that TAD boundaries can 
have a critical role in high-order chromatin folding and proper long- 
range transcriptional control. Future work will clarify the mechanisms 
driving this level of chromosomal organization, and to what extent it 
generally contributes to transcriptional regulation. In summary, our 
study provides new insights into the cis-regulatory architecture of 
chromosomes that orchestrates transcriptional dynamics during 
development, and paves the way to dissecting the constellation of 
control elements of Xist and its regulators within the Xic. 


METHODS SUMMARY 


5C was performed on mouse ESCs, mouse NPCs and primary MEFs following a 
previously described protocol’ with modifications, and sequenced on one lane of 
an Illumina GAIIx. RNA and DNA FISH were performed on mouse ESCs and 
inner cell masses extracted from pre-implantation embryos as previously 
described’, with modifications. Full experimental and bioinformatic methods 
are detailed in Supplementary Information. 
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Topological domains in mammalian genomes 
identified by analysis of chromatin interactions 


Jesse R. Dixon’, Siddarth Selvaraj'"*, Feng Yue', Audrey Kim, Yan Li', Yin Shen!, Ming Hu”, Jun S. Liu® & Bing Ren® 


The spatial organization of the genome is intimately linked to its 
biological function, yet our understanding of higher order genomic 
structure is coarse, fragmented and incomplete. In the nucleus of 
eukaryotic cells, interphase chromosomes occupy distinct chro- 
mosome territories, and numerous models have been proposed 
for how chromosomes fold within chromosome territories’. These 
models, however, provide only few mechanistic details about the 
relationship between higher order chromatin structure and genome 
function. Recent advances in genomic technologies have led to rapid 
advances in the study of three-dimensional genome organiza- 
tion. In particular, Hi-C has been introduced as a method for iden- 
tifying higher order chromatin interactions genome wide’. Here we 
investigate the three-dimensional organization of the human and 
mouse genomes in embryonic stem cells and terminally differen- 
tiated cell types at unprecedented resolution. We identify large, 
megabase-sized local chromatin interaction domains, which we 
term “topological domains’, as a pervasive structural feature of the 
genome organization. These domains correlate with regions of the 
genome that constrain the spread of heterochromatin. The domains 
are stable across different cell types and highly conserved across 
species, indicating that topological domains are an inherent 
property of mammalian genomes. Finally, we find that the 
boundaries of topological domains are enriched for the insulator 
binding protein CTCF, housekeeping genes, transfer RNAs and 
short interspersed element (SINE) retrotransposons, indicating 
that these factors may have a role in establishing the topological 
domain structure of the genome. 

To study chromatin structure in mammalian cells, we determined 
genome-wide chromatin interaction frequencies by performing the 
Hi-C experiment’ in mouse embryonic stem (ES) cells, human ES cells, 
and human IMR90 fibroblasts. Together with Hi-C data for the mouse 
cortex generated in a separate study (Y. Shen et al., manuscript in 
preparation), we analysed over 1.7-billion read pairs of Hi-C data 
corresponding to pluripotent and differentiated cells (Supplemen- 
tary Table 1). We normalized the Hi-C interactions for biases in the 
data (Supplementary Figs 1 and 2)’. To validate the quality of our Hi-C 
data, we compared the data with previous chromosome conformation 
capture (3C), chromosome conformation capture carbon copy (5C), 
and fluorescence in situ hybridization (FISH) results**. Our IMR90 
Hi-C data show a high degree of similarity when compared to a previ- 
ously generated 5C data set from lung fibroblasts (Supplementary Fig. 4). 
In addition, our mouse ES cell Hi-C data correctly recovered a previ- 
ously described cell-type-specific interaction at the Phcl gene° 
(Supplementary Fig. 5). Furthermore, the Hi-C interaction frequencies 
in mouse ES cells are well-correlated with the mean spatial distance 
separating six loci as measured by two-dimensional FISH® 
(Supplementary Fig. 6), demonstrating that the normalized Hi-C data 
can accurately reproduce the expected nuclear distance using an inde- 
pendent method. These results demonstrate that our Hi-C data are of 


high quality and accurately capture the higher order chromatin struc- 
tures in mammalian cells. 

We next visualized two-dimensional interaction matrices using a 
variety of bin sizes to identify interaction patterns revealed as a result of 
our high sequencing depth (Supplementary Fig. 7). We noticed that at 
bin sizes less than 100 kilobases (kb), highly self-interacting regions 
begin to emerge (Fig. 1a and Supplementary Fig. 7, seen as ‘triangles’ 
on the heat map). These regions, which we term topological domains, 
are bounded by narrow segments where the chromatin interactions 
appear to end abruptly. We hypothesized that these abrupt transitions 
may represent boundary regions in the genome that separate topo- 
logical domains. 

To identify systematically all such topological domains in the 
genome, we devised a simple statistic termed the directionality index 
to quantify the degree of upstream or downstream interaction bias for 
a genomic region, which varies considerably at the periphery of the 
topological domains (Fig. 1b; see Supplementary Methods for details). 
The directionality index was reproducible (Supplementary Table 2) 
and pervasive, with 52% of the genome having a directionality 
index that was not expected by random chance (Fig. Ic, false discovery 
rate = 1%). We then used a Hidden Markov model (HMM) based on 
the directionality index to identify biased ‘states’ and therefore infer 
the locations of topological domains in the genome (Fig. 1a; see 
Supplementary Methods for details). The domains defined by HMM 
were reproducible between replicates (Supplementary Fig. 8). 
Therefore, we combined the data from the HindIII replicates and 
identified 2,200 topological domains in mouse ES cells with a median 
size of 880kb that occupy ~91% of the genome (Supplementary 
Fig. 9). As expected, the frequency of intra-domain interactions is 
higher than inter-domain interactions (Fig. 1d, e). Similarly, FISH 
probes’ in the same topological domain (Fig. 1f) are closer in nuclear 
space than probes in different topological domains (Fig. 1g), despite 
similar genomic distances between probe pairs (Fig. 1h, i). These find- 
ings are best explained by a model of the organization of genomic DNA 
into spatial modules linked by short chromatin segments. We define 
the genomic regions between topological domains as either ‘topo- 
logical boundary regions’ or “unorganized chromatin’, depending on 
their sizes (Supplementary Fig. 9). 

We next investigated the relationship between the topological 
domains and the transcriptional control process. The Hoxa locus is 
separated into two compartments by an experimentally validated insu- 
lator*”*, which we observed corresponds to a topological domain 
boundary in both mouse (Fig. la) and human (Fig. 2a). Therefore, 
we hypothesized that the boundaries of the topological domains might 
correspond to insulator or barrier elements. 

Many known insulator or barrier elements are bound by the zinc- 
finger-containing protein CTCF (refs 9-11). We see a strong enrich- 
ment of CTCF at the topological boundary regions (Fig. 2b and 
Supplementary Fig. 10), indicating that topological boundary regions 
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Jolla, California 92093, USA. 5Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, Massachusetts 02138, USA. University of California, San Diego School of Medicine, Department of 
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Figure 1 | Topological domains in the mouse ES cell genome. a, Normalized 
Hi-C interaction frequencies displayed as a two-dimensional heat map 
overlayed on ChIP-seq data (from Y. Shen et al., manuscript in preparation), 
directionality index (DI), HMM bias state calls, and domains. For both 
directionality index and HMM state calls, downstream bias (red) and upstream 
bias (green) are indicated. b, Schematic illustrating topological domains and 
resulting directional bias. c, Distribution of the directionality index (absolute 
value, in blue) compared to random (red). d, Mean interaction frequencies at all 
genomic distances between 40 kb to 2 Mb. Above 40 kb, the intra- versus inter- 
domain interaction frequencies are significantly different (P < 0.005, Wilcoxon 
test). e, Box plot of all interaction frequencies at 80-kb distance. Intra-domain 
interactions are enriched for high-frequency interactions. f-i, Diagram of intra- 
domain (f) and inter-domain FISH probes (g) and the genomic distance 
between pairs (h). i, Bar chart of the squared inter-probe distance (from ref. 6) 
FISH probe pairs. mESC, mouse ES cell. Error bars indicate standard error 

(n = 100 for each probe pair). 


share this feature of classical insulators. A classical boundary element 
is also known to stop the spread of heterochromatin. Therefore, we 
examined the distribution of the heterochromatin mark H3K9me3 in 
humans and mice in relation to the topological domains'’*”’. Indeed, 
we observe a clear segregation of H3K9me3 at the boundary regions 
that occurs predominately in differentiated cells (Fig. 2d, e and 
Supplementary Fig. 11). As the boundaries that we analysed in 
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Figure 2 | Topological boundaries demonstrate classical insulator or 
barrier element features. a, Two-dimensional heat map surrounding the Hoxa 
locus and CS5 insulator in IMR90 cells. b, Enrichment of CTCF at boundary 
regions. c, The portion of CTCF binding sites that are considered ‘associated’ 
with a boundary (within +20-kb window is used as the expected uncertainty 
due to 40-kb binning). d, Heat maps of H3K9me3 at boundary sites in human 
and mouse. e, UCSC Genome Browser shot showing heterochromatin 
spreading in the human ES cells (hESC) and IMR90 cells. The two-dimensional 
heat map shows the interaction frequency in human ES cells. f, Heat map of 
LADs (from ref. 14) surrounding the boundary regions. Scale is the log, ratio of 
DNA adenosine methylation (Dam)-lamin B1 fusion over Dam alone (Dam- 
laminB1/Dam). 


Fig. 2d are present in both pluripotent cells and their differentiated 
progeny, the topological domains and boundaries appear to pre-mark 
the end points of heterochromatic spreading. Therefore, the domains 
do not seem to be a consequence of the formation of heterochromatin. 
Taken together, the above observations strongly suggest that the topo- 
logical domain boundaries correlate with regions of the genome dis- 
playing classical insulator and barrier element activity, thus revealing a 
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potential link between the topological domains and transcriptional 
control in the mammalian genome. 

We compared the topological domains with previously described 
domain-like organizations of the genome, specifically with the A and B 
compartments described by ref. 2, with lamina-associated domains 
(LADs)’°"*, replication time zones’*”’, and large organized chromatin 
K9 modification (LOCK) domains”. In all cases, we can see that topo- 
logical domains are related to, but independent from, each of these 
previously described domain-like structures (Supplementary Figs 12- 
15). Notably, a subset of the domain boundaries we identify appear to 
mark the transition between either LAD and non-LAD regions of the 
genome (Fig. 2f and Supplementary Fig. 12), the A and B compart- 
ments (Supplementary Fig. 13, 14), and early and late replicating chro- 
matin (Supplementary Fig. 14). Lastly, we can also confirm the 
previously reported similarities between the A and B compartments 
and early and late replication time zone (Supplementary Fig. 16)'*. 

We next compared the locations of topological boundaries iden- 
tified in both replicates of mouse ES cells and cortex, or between both 
replicates of human ES cells and IMR90 cells. In both human and 
mouse, most of the boundary regions are shared between cell types 
(Fig. 3a and Supplementary Fig. 17a), suggesting that the overall 
domain structure between cell types is largely unchanged. At the 
boundaries called in only one cell type, we noticed that trend of 
upstream and downstream bias in the directionality index is still readily 
apparent and highly reproducible between replicates (Supplementary 
Fig. 17b, c). We cannot determine if the differences in domain calls 
between cell types is due to noise in the data or to biological phenomena, 
such as a change in the strength of the boundary region between cell 
types'®. Regardless, these results indicate that the domain boundaries 
are largely invariant between cell types. Lastly, only a small fraction of 
the boundaries show clear differences between two cell types, suggesting 
that a relatively rare subset of boundaries may actually differ between 
cell types (Supplementary Fig. 18). 

The stability of the domains between cell types is surprising given 
previous evidence showing cell-type-specific chromatin interactions 
and conformations*’. To reconcile these results, we identified cell- 
type-specific chromatin interactions between mouse ES cell and mouse 
cortex. We identified 9,888 dynamic interacting regions in the mouse 
genome based on 20-kb binning using a binomial test with an empirical 
false discovery rate of <1% based on random permutation of the 
replicate data. These dynamic interacting regions are enriched for 
differentially expressed genes (Fig. 3b-d, Supplementary Fig. 19 and 
Supplementary Table 5). In fact, 20% of all genes that undergo a four- 
fold change in gene expression are found at dynamic interacting loci. 
This is probably an underestimate, because by binning the genome at 
20kb, any dynamic regulatory interaction less than 20kb will be 
missed. Lastly, >96% of dynamic interacting regions occur in the same 
domain (Fig. 3e). Therefore, we favour a model where the domain 
organization is stable between cell types, but the regions within each 
domain may be dynamic, potentially taking part in cell-type-specific 
regulatory events. 

The stability of the domains between cell types prompted us to 
investigate if the domain structure is also conserved across evolution. 
To address this, we compared the domain boundaries between mouse 
ES cells and human ES cells using the UCSC liftover tool. Most of the 
boundaries appear to be shared across evolution (53.8% of human 
boundaries are boundaries in mouse and 75.9% of mouse boundaries 
are boundaries in humans, compared to 21.0% and 29.0% at random, 
Pvalue <2.2 X 10” '*, Fisher’s exact test; Fig. 3f). The syntenic regions 
in mouse and human in particular share a high degree of similarity in 
their higher order chromatin structure (Fig. 3g, h), indicating that 
there is conservation of genomic structure beyond the primary 
sequence of DNA. 

We explored what factors may contribute to the formation of topo- 
logical boundary regions in the genome. Although most topological 
boundaries are enriched for the binding of CTCF, only 15% of CTCF 
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Figure 3 | Boundaries are shared across cell types and conserved in 
evolution. a, Overlap of boundaries between cell types. b, Genome browser 
shot of a cortex enriched dynamic interacting region that overlaps with the 
Foxgl gene. c, Foxgl expression in reads per kilobase per million reads 
sequenced (r.p.k.m.) in mouse ES cells and cortex as measured by RNA-seq. 
d, Heat map of the gene expression ratio between mouse ES cell and cortex of 
genes at dynamic interactions. e, Pie chart of inter- and intra-domain dynamic 
interactions. f, Overlap of boundaries between syntenic mouse and human 
sequences (P< 2.2 X 10 1° compared to random, Fisher’s exact test). 

g, h, Genome browser shots showing domain structure over a syntenic region in 
the mouse (g) and human (h) ES cells. Note: the region in humans has been 
inverted from its normal UCSC coordinates for proper display purposes. 


binding sites are located within boundary regions (Fig. 2c). Thus, 
CTCF binding alone is insufficient to demarcate domain boundaries. 
We reasoned that additional factors might be associated with topo- 
logical boundary regions. By examining the enrichment of a variety of 
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histone modifications, chromatin binding proteins and transcription 
factors around topological boundary regions in mouse ES cells, we 
observed that factors associated with active promoters and gene bodies 
are enriched at boundaries in both mouse and humans (Fig. 4a and 
Supplementary Figs 20-23)'””°. In contrast, non-promoter-associated 
marks, such as H3K4mel (associated with enhancers) and H3K9me3, 
were not enriched or were specifically depleted at boundary regions 
(Fig. 4a). Furthermore, transcription start sites (TSS) and global run on 
sequencing (GRO-seq)”' signal were also enriched around topological 
boundaries (Fig. 4a). We found that housekeeping genes were particu- 
larly strongly enriched near topological boundary regions (Fig. 4b-d; 
see Supplementary Table 7 for complete GO terms enrichment). 
Additionally, the tRNA genes, which have the potential to function 
as boundary elements”*”*, are also enriched at boundaries (P value 
<0.05, Fisher’s exact test; Fig. 4b). These results suggest that high levels 
of transcription activity may also contribute to boundary formation. In 
support of this, we can see examples of dynamic changes in H3K4me3 
at or near some cell-type-specific boundaries that are cell-type specific 
(Supplementary Fig. 24). Indeed, boundaries associated with both 
CTCF and a housekeeping gene account for nearly one-third of all topo- 
logical boundaries in the genome (Fig. 4e and Supplementary Fig. 24). 
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Figure 4 | Boundary regions are enriched for housekeeping genes. 

a, Chromatin modifications, TSS, GRO-seq and SINE elements surrounding 
boundary regions in mouse ES cells or IMR90 cells. b, Boundaries associated 
with a CTCF binding site, housekeeping gene, or tRNA gene (purple) compared 
to expected at random (grey). c, Gene Ontology P-value chart. d, Enrichment of 
housekeeping genes (gold) and tissue-specific genes (blue) as defined by 
Shannon entropy scores near boundaries normalized for the number of genes 
in each class (TSS/10 kb/total TSS). e, Percentage of boundaries with a given 
mark within 20 kb of the boundaries. 
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Finally, we analysed the enrichment of repeat classes around boundary 
elements. We observed that Alu/B1 and B2 SINE retrotransposons in 
mouse and Alu SINE elements in humans are enriched at boundary 
regions (Fig. 4a and Supplementary Figs 24 and 25). In light of recent 
reports indicating that a SINE B2 element functions as a boundary in 
mice”, and SINE element retrotransposition may alter CTCF binding 
sites during evolution”, we believe that this contributes to a growing 
body of evidence indicating a role for SINE elements in the organiza- 
tion of the genome. 

In summary, we show that the mammalian chromosomes are seg- 
mented into megabase-sized topological domains, consistent with 
some previous models of the higher order chromatin structure’”*”’. 
Such spatial organization seems to bea general property of the genome: 
it is pervasive throughout the genome, stable across different cell types 
and highly conserved between mice and humans. 

We have identified multiple factors that are associated with the 
boundary regions separating topological domains, including the insu- 
lator binding factor CTCF, housekeeping genes and SINE elements. 
The association of housekeeping genes with boundary regions extends 
previous studies in yeast and insects and suggests that non-CTCF 
factors may also be involved in insulator/barrier functions in mam- 
malian cells”. 

The topological domains we identified are well conserved between 
mice and humans. This indicates that the sequence elements and 
mechanisms that are responsible for establishing higher order struc- 
tures in the genome may be relatively ancient in evolution. A similar 
partitioning of the genome into physical domains has also been 
observed in Drosophila embryos” and in high-resolution studies of 
the X-inactivation centre in mice (termed topologically associated 
domains or TADs)”, indicating that topological domains may be a 
fundamental organizing principle of metazoan genomes. 


METHODS SUMMARY 


Cell culture and Hi-C experiments. J1 mouse ES cells were grown on gamma- 
irradiated mouse embryonic fibroblasts cells under standard conditions (85% high 
glucose DMEM, 15% HyClone FBS, 0.1 mM non-essential amino acids, 0.1 mM 
B-mercaptoethanol, 1mM glutamine, LIF 500U ml ', 1X Gibco penicillin/ 
streptomycin). Before collecting for Hi-C, J1 mouse ES cells were passaged onto 
feeder free 0.2% gelatin-coated plates for at least two passages to rid the culture of 
feeder cells. H1 human ES cells and IMR90 fibroblasts were grown as previously 
described'*. Collecting the cells for Hi-C was performed as previously described, 
with the only modification being that the adherent cell cultures were dissociated 
with trypsin before fixation. 

Sequencing and mapping of data. Hi-C analysis and paired-end libraries were 
prepared as previously described’ and sequenced on the Illumina Hi-Seq2000 
platform. Reads were mapped to reference human (hg18) or mouse genomes 
(mm9), and non-mapping reads and PCR duplicates were removed. Two- 
dimensional heat maps were generated as previously described’. 

Data analysis. For detailed descriptions of the data analysis, including descrip- 
tions of the directionality index, hidden Markov models, dynamic interactions 
identification, and boundary overlap between cells and across species, see 
Supplementary Methods. 
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